[Users] CarpetIOHDF5::output_index = "yes" by default?
Ian Hinder
ian.hinder at aei.mpg.de
Thu Feb 28 08:59:56 CST 2013
On 28 Feb 2013, at 15:47, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
> Ian
>
> I have nothing against the index files. It is just that, if it can be done in post-processing, it should be done there, because creating index files doesn't require 1000 cores.
I think that if it does not have a noticeable performance impact, it is much more convenient to do it during the simulation. Having to add a post-processing stage is detrimental to the workflow I think. I never have to postprocess my simulations.
> With Simfactory, or maybe even with an easy-to-use shell script that is automatically placed in the output directory, we would not have to. This is a point of principle, i.e. can easily be overruled by practical considerations.
>
> Regarding documentation: I think a small section describing what you just said (same structure, no data content, etc.) would suffice. I'm just pushing for
> documentation here, this is not a show-stopper.
OK. Let's write some documentation, and change the default to "yes". It significantly speeds up visualising large simulations with VisIt, which supports the index files. We should also address https://trac.einsteintoolkit.org/ticket/1273 before changing the default.
>
> -erik
>
>
>
>
>
> On Thu, Feb 28, 2013 at 8:16 AM, Ian Hinder <ian.hinder at aei.mpg.de> wrote:
>
> On 27 Feb 2013, at 22:33, Erik Schnetter <schnetter at cct.lsu.edu> wrote:
>
>> If it is possible to generate the index files as post-processing step, then I would prefer that, as it reduces the time the simulation spends waiting for I/O. We could add it e.g. to Simfactory's cleanup step to have it happen automatically.
>
> It is possible; the index files are identical to the original files, but with no data written to the datasets. People don't visualise their data enough, and having the index files makes it more likely that people will visualise their data, as reading the data will be faster. Having an additional step of post-processing the data is annoying.
>
>> How well-documented are the index files? If they are generated by default, then there should be a section in CarpetIOHDF5's thorn guide (or in the Visit reader's documentation) describing why they are a good idea, and what information they contain, and how they can be used to speed up input.
>
> I don't think there is documentation for them apart from the description of the parameter in param.ccl.
>
> It sounds like you don't want the additional index files appearing in users' output directories without them understanding what they are. I agree that having the index files in the output directory is a bit ugly. Another alternative would be to embed the content of the index files in the original HDF5 files when the simulation terminates, or periodically. The index files are small, so this should not be a large overhead. This would be a binary dataset which could be accessed by name without iterating all the datasets in the file.
>
>
>
>
>>
>> -erik
>>
>>
>>
>> On Wed, Feb 27, 2013 at 2:41 PM, Roland Haas <roland.haas at physics.gatech.edu> wrote:
>> Hello all,
>>
>> > Would somebody object to making CarpetIOHDF5::output_index = "yes" the
>> > default? (It is currently "no".)
>> >
>> > It should not hurt to create these small files. (Or, does it?)
>> It creates files which in itself might be bad. Creating files can be an
>> issue on some file systems where creating files is slow (or there might
>> be limit on the number of files, that apparently on some clusters
>> [admittedly I know of no Cactus users there] can be as low as 1e6 files
>> *total*).
>>
>> Also currently there is a bug/mis-feature in that output_index writes
>> indices for both data and checkpoint files. The later probably should
>> have a separate switch since index files are less useful there.
>>
>> If the issue is that we currently have no method for users to create
>> index files for existing HDF5 files once eg they find that visualization
>> is slow, then for that I have a modified version of hdf5_merge (or was
>> it hdf5_extract... anyway one of them) where I simply commented out the
>> final H5Dwrite and which generates perfectly fine index files (of the
>> "new" format that do not have the extra attribute with just that change
>> or "old" ones with the attribute with the obvious changes [that are also
>> in]). It's of course a bit of a hack since the tools were not originally
>> meant to do that.
>>
>> Yours,
>> Roland
>>
>> --
>> My email is as private as my paper mail. I therefore support encrypting
>> and signing email messages. Get my PGP key from http://keys.gnupg.net.
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>>
>>
>>
>> --
>> Erik Schnetter <schnetter at cct.lsu.edu>
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>> _______________________________________________
>> Users mailing list
>> Users at einsteintoolkit.org
>> http://lists.einsteintoolkit.org/mailman/listinfo/users
>
> --
> Ian Hinder
> http://numrel.aei.mpg.de/people/hinder
>
>
>
>
> --
> Erik Schnetter <schnetter at cct.lsu.edu>
> http://www.perimeterinstitute.ca/personal/eschnetter/
--
Ian Hinder
http://numrel.aei.mpg.de/people/hinder
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130228/bb9b88f9/attachment-0001.html
More information about the Users
mailing list