[Users] Avoiding writing one checkpoint file per MPI process

Roland Haas rhaas at illinois.edu
Thu Oct 6 11:51:56 CDT 2022


Hello Lorenzo,

Unfortunately, Carpet will always write one checkpoint file per MPI
rank, there is no way to change that.

As you learned the option out_proc_every only affects the out3D_vars
output (and possible out_vars 3D output) but never checkpoints.

In my opinion, you should be impossible to stress the file system, of a
reasonably provisioned cluster, with the checkpoints. Even when running
on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very
quick (1min or so) and barely made a blip on the system monitoring
radar. Any cluster with sufficiently many nodes to run at scale at 
1 file per rank (for a sane number of ranks ie some OpenMP threads)
should have a file system capable of taking checkpoints. Of course 1
rank per core is no longer "sane" once you go beyond a couple hundred
cores.

Now writing 1 file per output variable and per MPI rank may be a
different thing....
In that case out_proc_every should help with out3D_vars. I would also
suggest one_file_per_group or even one_file_per_rank for this (see
CarpetIOHDF5's param.ccl), which will have less of a performance (no
communication) impact than out_proc_every != 1.

If the issue is opening many files (again, only for out3D_vars regular
output), then you may also see benefits from the different options in:

https://bitbucket.org/eschnett/carpet/pull-requests/34

https://bitbucket.org/einsteintoolkit/tickets/issues/2364

Yours,
Roland

> Hello,
> In order to avoid stressing the filesystem on the cluster I'm running on, I
> was suggested to avoid writing one output/checkpoint file per MPI process
> and instead collecting data from multiple processes before
> outputting/checkpointing happens. I found the combination of parameters
> 
> IO::out_mode       = "np"
> IO::out_proc_every = 8
> 
> does the job for output files, but I still have one checkpoint file per
> process. Is there a similar parameter, or combination of parameters, which
> can be used for checkpoint files?
> 
> Thank you very much,
> Lorenzo Ennoggi



-- 
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://keys.gnupg.net.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20221006/4a5483f9/attachment.bin 


More information about the Users mailing list