[Users] Avoiding writing one checkpoint file per MPI process

Lorenzo Ennoggi lorenzo.ennoggi at gmail.com
Tue Oct 11 16:38:38 CDT 2022


Hello Roland, all,
could it be possible for you to have a quick look at the parameter file I
am using (attached) to check if there is anything manifestly
wrong/unsafe/unrecommended with checkpointing or with other I/O options? In
case there are any issues, I can then take care of them and report back to
the Frontera people.

Thank you very much in advance,
Lorenzo

Il giorno gio 6 ott 2022 alle ore 15:29 Roland Haas <rhaas at illinois.edu> ha
scritto:

> Hello Lorenzo,
>
> TACC saved a bit of money on the IO system on Frontera :-) and thus
> they now need to fix bugs in documentation.
>
> Yours,
> Roland
>
> > Hi Roland,
> > thank you, your suggestions are very useful. I was running one process
> per
> > core on more than 200 cores, so that may be part of the issue. Also, I
> will
> > try the one_file_per_group or one_file_per_rank options to reduce the
> > performance impact.
> >
> > The cluster I'm running on is Frontera, and the guidelines to manage I/O
> > operations properly on it are here
> > <
> https://urldefense.com/v3/__https://portal.tacc.utexas.edu/tutorials/managingio__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd05VwT-aw$
> > in case people are
> > interested. I will follow them as closely as I can to avoid similar
> > problems in the future.
> >
> > Thank you very much again,
> > Lorenzo
> >
> > Il giorno gio 6 ott 2022 alle ore 12:52 Roland Haas <rhaas at illinois.edu>
> ha
> > scritto:
> >
> > > Hello Lorenzo,
> > >
> > > Unfortunately, Carpet will always write one checkpoint file per MPI
> > > rank, there is no way to change that.
> > >
> > > As you learned the option out_proc_every only affects the out3D_vars
> > > output (and possible out_vars 3D output) but never checkpoints.
> > >
> > > In my opinion, you should be impossible to stress the file system, of a
> > > reasonably provisioned cluster, with the checkpoints. Even when running
> > > on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very
> > > quick (1min or so) and barely made a blip on the system monitoring
> > > radar. Any cluster with sufficiently many nodes to run at scale at
> > > 1 file per rank (for a sane number of ranks ie some OpenMP threads)
> > > should have a file system capable of taking checkpoints. Of course 1
> > > rank per core is no longer "sane" once you go beyond a couple hundred
> > > cores.
> > >
> > > Now writing 1 file per output variable and per MPI rank may be a
> > > different thing....
> > > In that case out_proc_every should help with out3D_vars. I would also
> > > suggest one_file_per_group or even one_file_per_rank for this (see
> > > CarpetIOHDF5's param.ccl), which will have less of a performance (no
> > > communication) impact than out_proc_every != 1.
> > >
> > > If the issue is opening many files (again, only for out3D_vars regular
> > > output), then you may also see benefits from the different options in:
> > >
> > >
> https://urldefense.com/v3/__https://bitbucket.org/eschnett/carpet/pull-requests/34__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1fZiBWGw$
>
> > >
> > >
> https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2364__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd0HNoNnvA$
>
> > >
> > > Yours,
> > > Roland
> > >
> > > > Hello,
> > > > In order to avoid stressing the filesystem on the cluster I'm
> running
> > > on, I
> > > > was suggested to avoid writing one output/checkpoint file per MPI
> process
> > > > and instead collecting data from multiple processes before
> > > > outputting/checkpointing happens. I found the combination of
> parameters
> > > >
> > > > IO::out_mode       = "np"
> > > > IO::out_proc_every = 8
> > > >
> > > > does the job for output files, but I still have one checkpoint file
> per
> > > > process. Is there a similar parameter, or combination of
> parameters,
> > > which
> > > > can be used for checkpoint files?
> > > >
> > > > Thank you very much,
> > > > Lorenzo Ennoggi
> > >
> > >
> > >
> > > --
> > > My email is as private as my paper mail. I therefore support encrypting
> > > and signing email messages. Get my PGP key from
> https://urldefense.com/v3/__http://keys.gnupg.net__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1xKACaIQ$
> .
> > >
>
>
>
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20221011/8e9d9b74/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CBD_handoff_IGM_McLachlan.par
Type: application/octet-stream
Size: 34366 bytes
Desc: not available
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20221011/8e9d9b74/attachment-0001.obj 


More information about the Users mailing list