[Users] Avoiding writing one checkpoint file per MPI process

Lorenzo Ennoggi lorenzo.ennoggi at gmail.com
Thu Oct 6 14:08:44 CDT 2022


Hi Roland,
thank you, your suggestions are very useful. I was running one process per
core on more than 200 cores, so that may be part of the issue. Also, I will
try the one_file_per_group or one_file_per_rank options to reduce the
performance impact.

The cluster I'm running on is Frontera, and the guidelines to manage I/O
operations properly on it are here
<https://portal.tacc.utexas.edu/tutorials/managingio> in case people are
interested. I will follow them as closely as I can to avoid similar
problems in the future.

Thank you very much again,
Lorenzo

Il giorno gio 6 ott 2022 alle ore 12:52 Roland Haas <rhaas at illinois.edu> ha
scritto:

> Hello Lorenzo,
>
> Unfortunately, Carpet will always write one checkpoint file per MPI
> rank, there is no way to change that.
>
> As you learned the option out_proc_every only affects the out3D_vars
> output (and possible out_vars 3D output) but never checkpoints.
>
> In my opinion, you should be impossible to stress the file system, of a
> reasonably provisioned cluster, with the checkpoints. Even when running
> on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very
> quick (1min or so) and barely made a blip on the system monitoring
> radar. Any cluster with sufficiently many nodes to run at scale at
> 1 file per rank (for a sane number of ranks ie some OpenMP threads)
> should have a file system capable of taking checkpoints. Of course 1
> rank per core is no longer "sane" once you go beyond a couple hundred
> cores.
>
> Now writing 1 file per output variable and per MPI rank may be a
> different thing....
> In that case out_proc_every should help with out3D_vars. I would also
> suggest one_file_per_group or even one_file_per_rank for this (see
> CarpetIOHDF5's param.ccl), which will have less of a performance (no
> communication) impact than out_proc_every != 1.
>
> If the issue is opening many files (again, only for out3D_vars regular
> output), then you may also see benefits from the different options in:
>
> https://bitbucket.org/eschnett/carpet/pull-requests/34
>
> https://bitbucket.org/einsteintoolkit/tickets/issues/2364
>
> Yours,
> Roland
>
> > Hello,
> > In order to avoid stressing the filesystem on the cluster I'm running
> on, I
> > was suggested to avoid writing one output/checkpoint file per MPI process
> > and instead collecting data from multiple processes before
> > outputting/checkpointing happens. I found the combination of parameters
> >
> > IO::out_mode       = "np"
> > IO::out_proc_every = 8
> >
> > does the job for output files, but I still have one checkpoint file per
> > process. Is there a similar parameter, or combination of parameters,
> which
> > can be used for checkpoint files?
> >
> > Thank you very much,
> > Lorenzo Ennoggi
>
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20221006/c2d80823/attachment.html 


More information about the Users mailing list