[ET Trac] [Einstein Toolkit] #1370: Provide a framework for simulation metadata

Fri May 24 05:12:21 CDT 2013

#1370: Provide a framework for simulation metadata
-------------------------+--------------------------------------------------
 Reporter:  hinder       |       Owner:     
     Type:  enhancement  |      Status:  new
 Priority:  major        |   Milestone:     
Component:  Cactus       |     Version:     
 Keywords:               |  
-------------------------+--------------------------------------------------
 When writing tools to analysis the output of a Cactus simulation, it would
 be very useful to have more information than is currently available, and
 some of the available information could be provided in a more convenient
 way.  For example, users set many parameters, and this provides a good
 source of information about the simulation, but the parameter file is not
 always output (e.g. in testsuite data), and it does not contain the value
 of parameters which are unset, and hence have their default value.
 Similarly, the value of a parameter is not necessarily a good indicator of
 what actually happened.  Often, a group of parameters needs to be
 interpreted together to determine the required quantity.  For example, if
 I want to know what the intended final time of the simulation was, I would
 have to look at Cactus::terminate, Cactus::cctk_itlast and
 Cactus::cctk_final_time.  If I want to know what the timestep or grid
 spacing on the coarsest grid is, I have to look at a similar set of
 parameters, or parse a grid structure file from Carpet for which there is
 no well-defined filename.  If I want to know what the last iteration
 actually was, I have to find an output file and look at it, and there
 might not even be any appropriate files, depending on the user's choices.

 I propose that Cactus provides a framework for simulation metadata.  The
 following is one possible way that it could work.

 1. Metadata for the simulation is collected and output to disk
 2. The metadata comes from both the flesh and from thorns
 3. The metadata format is extensible
 4. The metadata format is easy to parse (hence, it is in a standard well-
 specified and commonly-supported format)
 5. The metadata file is easily human-readable
 6. The metadata file is always output, so that analysis tools can expect
 that it is present in modern simulations
 7. The metadata file is not too large
 8. The framework for metadata is managed by the flesh, as it is important
 and will be available for every Cactus simulation
 9. One possible format for the metadata file is the "ini" file format, as
 used by SimFactory.  This satisfies 3, 4 and 5 above.
 10. There would be one section per implementation active in the
 simulation, and one for the flesh.
 11. Each thorn is responsible for determining what metadata keys should be
 output.
 12. The flesh will output essential characteristics of the simulation that
 is knows about, e.g. start and end iteration and times, run title, etc.
 13. Output thorns will output the names of output files, and a description
 of what they contain.
 14. Some metadata will be available at startup, some at termination, and
 some will become available only periodically.  For example, due to
 parameter steering, the set of available output files might get larger
 during the simulation.  We could either handle this by parsing and
 rewriting the metadata file to insert extra information into existing
 sections, or allow sections to be repeated.  We have a parsing framework
 in the flesh now (Piraha), so this should be straightforward.
 15. Metadata files will be modified safely (e.g. by writing a new one to a
 temporary file and moving it over the old one)
 16. A distinction will be made between metadata items and parameters.
 Often, there will be a 1-1 correspondence between these.  As a result, it
 would be good to have a convenient way for thorn authors to easily mark
 parameters as suitable for direct inclusion in the metadata file.  For
 example, marking a parameter with a keyword "metadata = yes" or equivalent
 in the param.ccl file would cause a metadata key for this parameter to be
 automatically included in the metadata file.
 17. Information which can change during a simulation might not be a good
 candidate for metadata; maybe then it becomes "data" and should be output
 in a separate file (pointed to by a metadata entry, of course).  In that
 case, setting "steerable" and "metadata" for a parameter in param.ccl
 should lead to an error.
 18. Metadata entries could be restricted to string values, or could have
 richer types.  Richer types such as strings, integers, floating point
 numbers, and lists (possibly with nesting) might be convenient.
 19. The flesh could provide a function CCTK_RecordMetadata(key, value)
 [surely the implementation does not need to be told to the flesh by the
 caller?].  This function would store the data in a flesh data structure,
 and note whether the on-disk file needed to be updated.
 20. Every iteration, the flesh (on the first process) would update the on-
 disk metadata file if it needed to be changed.
 21. The sections in the metadata file will correspond to implementations,
 and multiple thorns providing the same implementation [who chose this
 name?] if providing the same information should provide it using the same
 key names.

 Related:

 * An example of this sort of idea is already implemented by TwoPunctures
 (#551), which outputs a TwoPunctures.bbh metadata file in the "numerical
 relativity data format".

 * The thorn Formaline currently handles a limited amount of metadata, but
 the scope is more limited than this ticket.  The above proposal could be
 implemented using Formaline, but then you could not always expect that the
 metadata file is available as Formaline might not have been activated.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1370>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit