[Users] Meeting minutes for 2022-12-15

Samuel Tootle tootle at itp.uni-frankfurt.de
Fri Dec 16 03:10:39 CST 2022


I don't think this is by default per say.  I use batchtools (instead of 
simfactory) exactly for this reason and discourage new users from 
restarting in the same directory as the parent checkpoints to avoid this 
exact outcome.  An additional issue that can arise is if a job is 
terminated before the walltime such that the data stored in ASCII/HDF5 
goes beyond the last checkpoint are potential ingestion issues due to 
data mismatching from the restart.  I think overall Kuibit handles this 
well, but it is has been an issue in the past for some users before 
learning to separate restarts into individual directories.

Cheers,

Samuel

On 12/16/22 10:01 AM, Bruno Giacomazzo wrote:
>
>     - Safety feature to avoid HDF5 files from being corrupted
>       * Leo requests a feature that would allow the user to e.g.,
>     generate one
>         output file per restart. With kuibit, there was interested in
>     switching from
>         the ASCII data files to the HDF5 in our research group.
>     However, in a recent
>         simulation it turned out that a node failure caused a crash as
>     one of the
>         HDF5 was being written to and we lost all data for an important
>         gridfunction. If one HDF5 file was written per restart (or
>     another safety
>         feature was in place), then this would have not been an issue,
>     as only one
>         of the chunks of data would have been corrupted. Leo will open
>     a ticket
>         about this.
>
>
> Isn't this done automatically when using simfactory? I have my hdf5 
> data written in the separate output-00?? directories (the ones 
> generated by symfactory at each restart) so that if one run has 
> problems I do not lose all the data.
>
> Cheers,
> Bruno
>
>
> -- 
>
> Prof. Bruno Giacomazzo
> Department of Physics
> University of Milano-Bicocca
> Piazza della Scienza 3
> 20126 Milano
> Italy
>
> email: bruno.giacomazzo at unimib.it
> phone: (+39) 02 6448 2321
> web: http://www.brunogiacomazzo.org
>
> ---------------------------------------------------------------------
> There are only 10 types of people in the world:
> Those who understand binary, and those who don't
> ----------------------------------------------------------------------
>
>
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20221216/c1f67324/attachment-0001.html 


More information about the Users mailing list