<div dir="ltr">Hi Roland,<div>thank you, your suggestions are very useful. I was running one process per core on more than 200 cores, so that may be part of the issue. Also, I will try the <font face="monospace">one_file_per_group</font> or <font face="monospace">one_file_per_rank</font> options to reduce the performance impact.</div><div><br></div><div>The cluster I&#39;m running on is Frontera, and the guidelines to manage I/O operations properly on it are <a href="https://portal.tacc.utexas.edu/tutorials/managingio">here</a> in case people are interested. I will follow them as closely as I can to avoid similar problems in the future.</div><div><br></div><div>Thank you very much again,</div><div>Lorenzo</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno gio 6 ott 2022 alle ore 12:52 Roland Haas &lt;<a href="mailto:rhaas@illinois.edu" target="_blank">rhaas@illinois.edu</a>&gt; ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Lorenzo,<br>

<br>

Unfortunately, Carpet will always write one checkpoint file per MPI<br>

rank, there is no way to change that.<br>

<br>

As you learned the option out_proc_every only affects the out3D_vars<br>

output (and possible out_vars 3D output) but never checkpoints.<br>

<br>

In my opinion, you should be impossible to stress the file system, of a<br>

reasonably provisioned cluster, with the checkpoints. Even when running<br>

on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very<br>

quick (1min or so) and barely made a blip on the system monitoring<br>

radar. Any cluster with sufficiently many nodes to run at scale at <br>

1 file per rank (for a sane number of ranks ie some OpenMP threads)<br>

should have a file system capable of taking checkpoints. Of course 1<br>

rank per core is no longer &quot;sane&quot; once you go beyond a couple hundred<br>

cores.<br>

<br>

Now writing 1 file per output variable and per MPI rank may be a<br>

different thing....<br>

In that case out_proc_every should help with out3D_vars. I would also<br>

suggest one_file_per_group or even one_file_per_rank for this (see<br>

CarpetIOHDF5&#39;s param.ccl), which will have less of a performance (no<br>

communication) impact than out_proc_every != 1.<br>

<br>

If the issue is opening many files (again, only for out3D_vars regular<br>

output), then you may also see benefits from the different options in:<br>

<br>

<a href="https://bitbucket.org/eschnett/carpet/pull-requests/34" rel="noreferrer" target="_blank">https://bitbucket.org/eschnett/carpet/pull-requests/34</a><br>

<br>

<a href="https://bitbucket.org/einsteintoolkit/tickets/issues/2364" rel="noreferrer" target="_blank">https://bitbucket.org/einsteintoolkit/tickets/issues/2364</a><br>

<br>

Yours,<br>

Roland<br>

<br>

&gt; Hello,<br>

&gt; In order to avoid stressing the filesystem on the cluster I&#39;m running on, I<br>

&gt; was suggested to avoid writing one output/checkpoint file per MPI process<br>

&gt; and instead collecting data from multiple processes before<br>

&gt; outputting/checkpointing happens. I found the combination of parameters<br>

&gt; <br>

&gt; IO::out_mode       = &quot;np&quot;<br>

&gt; IO::out_proc_every = 8<br>

&gt; <br>

&gt; does the job for output files, but I still have one checkpoint file per<br>

&gt; process. Is there a similar parameter, or combination of parameters, which<br>

&gt; can be used for checkpoint files?<br>

&gt; <br>

&gt; Thank you very much,<br>

&gt; Lorenzo Ennoggi<br>

<br>

<br>

<br>

-- <br>

My email is as private as my paper mail. I therefore support encrypting<br>

and signing email messages. Get my PGP key from <a href="http://keys.gnupg.net" rel="noreferrer" target="_blank">http://keys.gnupg.net</a>.<br>

</blockquote></div>