[Users] memory leak in Carpet?
Miguel Zilhão
miguel.zilhao.nogueira at tecnico.ulisboa.pt
Wed Aug 8 06:38:01 CDT 2018
hi Ian,
> The memory problems are very likely strongly related to the machine you run on. I don't know that
> we can take much information from a smaller test run on a different machine. We already see from
> this run that Carpet is not "leaking" memory continuously; the curves for allocated memory show what
> has been malloced and not freed, and it remains more or less constant after the initial phase.
>
> I think it's worth trying to get tcmalloc running on the cluster. So this means that you have never
> seen the OOM happen when using tcmalloc. It's possible that the improved memory allocation in
> tcmalloc over glibc would entirely solve the problem.
well, i did have cases where i'd ran out of memory also in my workstation with tcmalloc (where i've
been doing these tests), with this same configuration and more resolution. i don't have an
OOM-killer in the workstation, though, so at some point the system would just start to swap (at
which point i'd kill the job).
> Sorry, I made a mistake. It should have been pageheap_unmapped, not pageheap_free. Sorry!
> pageheap_free is essentially zero, and cannot account for the difference.
ah, no problem. i'm attaching the updated plot.
>>> The point that Roland made also applies here: we are looking at the max across all processes and
>>> assuming that every process is the same. It's possible that one process has a high unmapped
>>> curve, but another has a high rss curve, and we don't see this on the plot. We would have to do
>>> 1D output of the grid arrays and plot each process separately to see the full detail. One way to
>>> see if this is necessary would be to plot both the max and min instead of just the max. That
>>> way, we can see if this is likely to be an issue.
>>
>> ok, i'm attaching another plot with both the min (dashed lines) and the max (full lines) plotted.
>> i hope it helps.
>
> Thanks. This shows that the gridfunction usage is more or less similar across all processes, which
> is good. However, there is significant variation in most of the other quantities across processes.
> To understand this better, we would have to look at 1D ASCII output of the grid arrays, which is a
> bit painful to plot in gnuplot. Before this, I would definitely try to get tcmalloc running and
> outputting this information on the cluster in a run that actually shows the OOM. My guess is that
> you won't get an OOM with tcmalloc, and all will be fine :)
ok, i could also try to do this on cluster once it's back online (currently it's down for maintenance).
thanks,
Miguel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: memory2.pdf
Type: application/pdf
Size: 15406 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20180808/3099b4ea/attachment-0001.pdf
More information about the Users
mailing list