[Users] visualization-friendly HDF5 format

Wed Aug 19 10:51:00 CDT 2015

On Tue, Aug 18, 2015 at 4:02 PM, Frank Loeffler <knarf at cct.lsu.edu> wrote:

> Hi,
>
> We had quite a discussion about the problems the current visualization
> tools have reading our hdf5 format. It became clear that we have to
> change/extend it, but before we think about how, I would like to see
> what specifically the readers have problems with. Let me start here with
> what I can recall, and please freely add to that.
>
> One of the problems is that readers currently have to iterate over all
> datasets present to get to some information they need. Getting the list
> of all datasets by itself can take quite a while on a regular hdf5 file,
> but then readers also have to look at attributed within each dataset.
> While of of this is necessary to visualize all data in a given file,
> most of the time not all of the data is actually necessary, and
> certainly not for just 'opening the file'.
>
> operations a reader needs to be fast:
> - list of variables (at a given iteration)
> - list of time steps / iterations
> - AMR structure for one given iteration (all maps, rls and components)
>
> Regardless of these additional meta-data, we already established the
> need for a meta-data file, effectively a copy of all datasets but
> without the actual data. Am I remembering correctly that the idea was to
> write this at run-time (eventually - right now it could be generated as
> post-processing)?
>
> Is this all readers would need?
>

Frank

As Jonah described during the ET workshop, we're working with the yt
developers to make Carpet's HDF5 format yt-friendly. At the moment, we are
adding the missing information, which so far included the set of active
grid points -- i.e. those that should be displayed for a given level, as
opposed to those that should be cut off (ghost, buffer, symmetry points).
One may argue that these points should not have been output at all, but
that would be a major change to the current file format.

Additional information may also be useful. This is information that is
already present in some way, but is cumbersome to extract. Presumably,
Carpet already reconstructs this information when reading data from a file,
but a visualization or analysis tools may want to access information in a
different order, and may not want to reimplement most of CarpetIOHDF5.

The items you list are a good starting point, but are too high-level to be
useful as a guide to implementing this. To make things concrete, I'd rather
collaborate with someone who is actually implementing a reader, and provide
the data that this reader needs. For example, you say you want a "list of
variables (as a given iteration)" -- do you really want a two-step table,
containing first a list of iterations, and then (for each iteration) a list
of variables? That's likely very different from what a user wants to
extract; rather, people want a set of variables, and for each variable, the
set of iterations at which this variable has data. Given that we have an
AMR format where certain levels exist only at certain iterations, this
requires a bit more detail for a complete specification.

How do you want the AMR structure to be presented? Currently, Carpet can
output a string that can be parsed reasonably easily, and which describes
the grid structure. Is that sufficient?

I hear the visualization tools are interested in learning the connectivity
between the different refinement levels. How should this be represented?
With vertex centering, the vertices overlap, and if you want to display the
cells around each vertex (control volume, necessary for volume rendering),
then you need to handle partial volumes. With cell centering, things are
easier, at least if you use CarpetRegrid2::snap_to_coarse -- which Carpet
doesn't require, but the visualization tool may want to insist on it.

So, in addition to giving a high-level list of features, let's also collect
a list of tools that will help us find out the details about these wanted
features.

We also may want to have a mechanism to "glue" the different output file
from different processors together, other than just looking for files with
similar names.

Finally, I disagree with the "established the need for a meta-data file".
This capability exists, and it speeds up reading for the current output
format, but the current output format has several obvious shortcomings; if
those were remedied, things may be much faster.

-erik

-- 
Erik Schnetter <schnetter at cct.lsu.edu>
http://www.perimeterinstitute.ca/personal/eschnetter/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20150819/90223bee/attachment.html