[Users] CarpetIOHDF5::output_index = "yes" by default?

Thu Feb 28 07:16:31 CST 2013

On 27 Feb 2013, at 22:33, Erik Schnetter <schnetter at cct.lsu.edu> wrote:

> If it is possible to generate the index files as post-processing step, then I would prefer that, as it reduces the time the simulation spends waiting for I/O. We could add it e.g. to Simfactory's cleanup step to have it happen automatically.

It is possible; the index files are identical to the original files, but with no data written to the datasets.  People don't visualise their data enough, and having the index files makes it more likely that people will visualise their data, as reading the data will be faster.  Having an additional step of post-processing the data is annoying.

> How well-documented are the index files? If they are generated by default, then there should be a section in CarpetIOHDF5's thorn guide (or in the Visit reader's documentation) describing why they are a good idea, and what information they contain, and how they can be used to speed up input.

I don't think there is documentation for them apart from the description of the parameter in param.ccl.  

It sounds like you don't want the additional index files appearing in users' output directories without them understanding what they are. I agree that having the index files in the output directory is a bit ugly.  Another alternative would be to embed the content of the index files in the original HDF5 files when the simulation terminates, or periodically.  The index files are small, so this should not be a large overhead.  This would be a binary dataset which could be accessed by name without iterating all the datasets in the file.  

> 
> -erik
> 
> 
> 
> On Wed, Feb 27, 2013 at 2:41 PM, Roland Haas <roland.haas at physics.gatech.edu> wrote:
> Hello all,
> 
> > Would somebody object to making CarpetIOHDF5::output_index = "yes" the
> > default? (It is currently "no".)
> >
> > It should not hurt to create these small files. (Or, does it?)
> It creates files which in itself might be bad. Creating files can be an
> issue on some file systems where creating files is slow (or there might
> be limit on the number of files, that apparently on some clusters
> [admittedly I know of no Cactus users there] can be as low as 1e6 files
> *total*).
> 
> Also currently there is a bug/mis-feature in that output_index writes
> indices for both data and checkpoint files. The later probably should
> have a separate switch since index files are less useful there.
> 
> If the issue is that we currently have no method for users to create
> index files for existing HDF5 files once eg they find that visualization
> is slow, then for that I have a modified version of hdf5_merge (or was
> it hdf5_extract... anyway one of them) where I simply commented out the
> final H5Dwrite and which generates perfectly fine index files (of the
> "new" format that do not have the extra attribute with just that change
> or "old" ones with the attribute with the obvious changes [that are also
> in]). It's of course a bit of a hack since the tools were not originally
> meant to do that.
> 
> Yours,
> Roland
> 
> --
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from http://keys.gnupg.net.
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users
> 
> 
> 
> -- 
> Erik Schnetter <schnetter at cct.lsu.edu>
> http://www.perimeterinstitute.ca/personal/eschnetter/
> _______________________________________________
> Users mailing list
> Users at einsteintoolkit.org
> http://lists.einsteintoolkit.org/mailman/listinfo/users

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130228/89fe1f2f/attachment-0001.html