[Users] question about checkpoints and number of procs
Roland Haas
rhaas at illinois.edu
Mon Apr 15 12:16:54 CDT 2024
Hello Luciano,
> As a matter of fact, I had that option already activated, otherwise it
> would just give me a memory error.
Hmm, ok.
> I'm thinking of maybe restarting the simulation with openMP activated to
> speed up the process, do you think it will help? Otherwise, I will try your
> hack.
I would be surprised if OpenMP helped since this is all IO bound and
there is no computation. Note that you must ensure that even if you do
not use OpenMP the number of MPI ranks is the same as your final
simulation, otherwise you will end up having to wait for the checkpoint
recovery again.
Using my hack you will need to switch to branch rhaas/map:
git checkout rhaas/map
then recompile and make sure you also compile all the utilities
(simfactory does that automatically, in Cactus itself this is make
foo-utils).
This will give you a new utility called
hdf5_create_binary_map
and there is also a helper script hdf5_create_binary_map.sh (both
should end up in exe/sim/*).
Run hdf5_create_binary_map.sh in each checkpoint file, it will produce
one "map" file per checkpoint file. This can be done in parallel (one
invocation per file, all invocations in parallel) if the cluster admins
let you (might need a short term interactive job maybe).
Concatenate all map files to a new map file using the same basename:
cat foo.file_*.map >foo.map
and make sure the concatenated map is in the same location as the
checkpoint files.
The logic for this is mostly in the ReadMap function of
CarpetIOHDF5/src/Input.cc
You may want to add a
CCTK_VINFO("Reading map files %s", fn);
just before the fopen call in there if you are not sure you have
everything set up correctly. Otherwise there is no (obvious)
indication that the map file is used (it silently falls back to the old
/ slow method if the map file is missing).
Yours,
Roland
--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.einsteintoolkit.org/pipermail/users/attachments/20240415/8560371d/attachment.sig>
More information about the Users
mailing list