[Users] Checkpointing with Cactus/Simfactory

Frank Loeffler knarf at cct.lsu.edu
Wed Aug 31 16:17:07 CDT 2016


On Wed, Aug 31, 2016 at 11:02:16PM +0200, dumsani wrote:
>Below is a segment of my parfile where checkpointing information is
>given, provides as a sample or as a basis for anyone who would
>want to advise me on how such information should be given.

Maybe it helps to explain what these options do.

>#####  Checkpointing #########
>CarpetIOHDF5::checkpoint                = yes

This enables checkpointing.

>IO::checkpoint_ID                       = yes

This specifies that initial data (ID) should be checkpointed as well. 
So, you get a checkpoint right after initial data generation and before 
the first evolution step. This makes sense if ID generation takes long, 
or if you are interested in these data itself.

>IO::recover                             = autoprobe

This instructs Cactus to restart from the latest checkpoint it finds, if 
it finds any. If it doesn't find any, it starts from initial data, like 
without checkpointing.

>IO::checkpoint_every                    = 1024

Checkpoint every so many iterations. Personally, I wouldn't use this, 
but a setting that depends on (wall) time - but there is nothing wrong 
with it.

>IO::out_proc_every                      = 2

Not specifically related to checkpointing.

>IO::checkpoint_keep                     = 3

Keep the last 3 checkpoints, delete older versions.

>IO::checkpoint_dir                      = $parfile

Put checkpoint files into a directory that is 'X' if the parameter file 
was called X.par (remove the .par extension).

Ideally, this should be accompanied by:

> IO::recover_dir = $parfile

Otherwise, using the same parfile, Cactus wouldn't find the generated 
checkpoint files.

>Carpet::regrid_during_recovery          = no
>CarpetIOHDF5::use_grid_structure_from_checkpoint = yes

Don't change the grid structure during recovery.

>CarpetIOHDF5::open_one_input_file_at_a_time = yes

Use less memory while reading files, at the possible expense of time.

Something interesting as well:

>IO::checkpoint_on_terminate         = "yes"

Dump a checkpoint on terminate. This enables to terminate the simulation 
at a certain point (e.g., just before wall-time runs out), and continue 
exactly where you stopped.

Frank

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20160831/bad1664a/attachment.bin 


More information about the Users mailing list