[Users] memory leak in Carpet?

Miguel Zilhão miguel.zilhao.nogueira at tecnico.ulisboa.pt
Wed Aug 8 06:38:01 CDT 2018


hi Ian,
> The memory problems are very likely strongly related to the machine you run on.  I don't know that 
> we can take much information from a smaller test run on a different machine. We already see from 
> this run that Carpet is not "leaking" memory continuously; the curves for allocated memory show what 
> has been malloced and not freed, and it remains more or less constant after the initial phase.
> 
> I think it's worth trying to get tcmalloc running on the cluster.  So this means that you have never 
> seen the OOM happen when using tcmalloc.  It's possible that the improved memory allocation in 
> tcmalloc over glibc would entirely solve the problem.

well, i did have cases where i'd ran out of memory also in my workstation with tcmalloc (where i've 
been doing these tests), with this same configuration and more resolution. i don't have an 
OOM-killer in the workstation, though, so at some point the system would just start to swap (at 
which point i'd kill the job).

> Sorry, I made a mistake.  It should have been pageheap_unmapped, not pageheap_free.  Sorry! 
>   pageheap_free is essentially zero, and cannot account for the difference.

ah, no problem. i'm attaching the updated plot.

>>> The point that Roland made also applies here: we are looking at the max across all processes and 
>>> assuming that every process is the same.  It's possible that one process has a high unmapped 
>>> curve, but another has a high rss curve, and we don't see this on the plot.  We would have to do 
>>> 1D output of the grid arrays and plot each process separately to see the full detail.  One way to 
>>> see if this is necessary would be to plot both the max and min instead of just the max.  That 
>>> way, we can see if this is likely to be an issue.
>>
>> ok, i'm attaching another plot with both the min (dashed lines) and the max (full lines) plotted. 
>> i hope it helps.
> 
> Thanks.  This shows that the gridfunction usage is more or less similar across all processes, which 
> is good.  However, there is significant variation in most of the other quantities across processes. 
>   To understand this better, we would have to look at 1D ASCII output of the grid arrays, which is a 
> bit painful to plot in gnuplot.  Before this, I would definitely try to get tcmalloc running and 
> outputting this information on the cluster in a run that actually shows the OOM.  My guess is that 
> you won't get an OOM with tcmalloc, and all will be fine :)

ok, i could also try to do this on cluster once it's back online (currently it's down for maintenance).

thanks,
Miguel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memory2.pdf
Type: application/pdf
Size: 15406 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20180808/3099b4ea/attachment-0001.pdf 


More information about the Users mailing list