<div dir="ltr">Hello Roland, all,<div>could it be possible for you to have a quick look at the parameter file I am using (attached) to check if there is anything manifestly wrong/unsafe/unrecommended with checkpointing or with other I/O options? In case there are any issues, I can then take care of them and report back to the Frontera people.</div><div><br></div><div>Thank you very much in advance,</div><div>Lorenzo</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Il giorno gio 6 ott 2022 alle ore 15:29 Roland Haas <<a href="mailto:rhaas@illinois.edu">rhaas@illinois.edu</a>> ha scritto:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello Lorenzo,<br>
<br>
TACC saved a bit of money on the IO system on Frontera :-) and thus<br>
they now need to fix bugs in documentation.<br>
<br>
Yours,<br>
Roland<br>
<br>
> Hi Roland,<br>
> thank you, your suggestions are very useful. I was running one process per<br>
> core on more than 200 cores, so that may be part of the issue. Also, I will<br>
> try the one_file_per_group or one_file_per_rank options to reduce the<br>
> performance impact.<br>
> <br>
> The cluster I'm running on is Frontera, and the guidelines to manage I/O<br>
> operations properly on it are here<br>
> <<a href="https://urldefense.com/v3/__https://portal.tacc.utexas.edu/tutorials/managingio__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd05VwT-aw$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__https://portal.tacc.utexas.edu/tutorials/managingio__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd05VwT-aw$</a> > in case people are<br>
> interested. I will follow them as closely as I can to avoid similar<br>
> problems in the future.<br>
> <br>
> Thank you very much again,<br>
> Lorenzo<br>
> <br>
> Il giorno gio 6 ott 2022 alle ore 12:52 Roland Haas <<a href="mailto:rhaas@illinois.edu" target="_blank">rhaas@illinois.edu</a>> ha<br>
> scritto:<br>
> <br>
> > Hello Lorenzo,<br>
> ><br>
> > Unfortunately, Carpet will always write one checkpoint file per MPI<br>
> > rank, there is no way to change that.<br>
> ><br>
> > As you learned the option out_proc_every only affects the out3D_vars<br>
> > output (and possible out_vars 3D output) but never checkpoints.<br>
> ><br>
> > In my opinion, you should be impossible to stress the file system, of a<br>
> > reasonably provisioned cluster, with the checkpoints. Even when running<br>
> > on 32k MPI ranks (and 4k nodes) on BW, checkpoint-recovery was very<br>
> > quick (1min or so) and barely made a blip on the system monitoring<br>
> > radar. Any cluster with sufficiently many nodes to run at scale at<br>
> > 1 file per rank (for a sane number of ranks ie some OpenMP threads)<br>
> > should have a file system capable of taking checkpoints. Of course 1<br>
> > rank per core is no longer "sane" once you go beyond a couple hundred<br>
> > cores.<br>
> ><br>
> > Now writing 1 file per output variable and per MPI rank may be a<br>
> > different thing....<br>
> > In that case out_proc_every should help with out3D_vars. I would also<br>
> > suggest one_file_per_group or even one_file_per_rank for this (see<br>
> > CarpetIOHDF5's param.ccl), which will have less of a performance (no<br>
> > communication) impact than out_proc_every != 1.<br>
> ><br>
> > If the issue is opening many files (again, only for out3D_vars regular<br>
> > output), then you may also see benefits from the different options in:<br>
> ><br>
> > <a href="https://urldefense.com/v3/__https://bitbucket.org/eschnett/carpet/pull-requests/34__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1fZiBWGw$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__https://bitbucket.org/eschnett/carpet/pull-requests/34__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1fZiBWGw$</a> <br>
> ><br>
> > <a href="https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2364__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd0HNoNnvA$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__https://bitbucket.org/einsteintoolkit/tickets/issues/2364__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd0HNoNnvA$</a> <br>
> ><br>
> > Yours,<br>
> > Roland<br>
> > <br>
> > > Hello,<br>
> > > In order to avoid stressing the filesystem on the cluster I'm running <br>
> > on, I <br>
> > > was suggested to avoid writing one output/checkpoint file per MPI process<br>
> > > and instead collecting data from multiple processes before<br>
> > > outputting/checkpointing happens. I found the combination of parameters<br>
> > ><br>
> > > IO::out_mode = "np"<br>
> > > IO::out_proc_every = 8<br>
> > ><br>
> > > does the job for output files, but I still have one checkpoint file per<br>
> > > process. Is there a similar parameter, or combination of parameters, <br>
> > which <br>
> > > can be used for checkpoint files?<br>
> > ><br>
> > > Thank you very much,<br>
> > > Lorenzo Ennoggi <br>
> ><br>
> ><br>
> ><br>
> > --<br>
> > My email is as private as my paper mail. I therefore support encrypting<br>
> > and signing email messages. Get my PGP key from <a href="https://urldefense.com/v3/__http://keys.gnupg.net__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1xKACaIQ$" rel="noreferrer" target="_blank">https://urldefense.com/v3/__http://keys.gnupg.net__;!!DZ3fjg!-IpspO3XOAP7Iq90ewHLXhCVTP8zRBTZV3k6XCsoyypUMszoctdY8pqhv7lN-OrMXRv5iGAFRSV1bmjKrd1xKACaIQ$</a> .<br>
> > <br>
<br>
<br>
<br>
-- <br>
My email is as private as my paper mail. I therefore support encrypting<br>
and signing email messages. Get my PGP key from <a href="http://keys.gnupg.net" rel="noreferrer" target="_blank">http://keys.gnupg.net</a>.<br>
</blockquote></div>