<html>#2543: Consolidate data formats to simplify postprocessing

<table style='border-spacing: 1ex 0pt; '>

<tr><td style='text-align:right'> Reporter:</td><td>Wolfgang Kastaun</td></tr>

<tr><td style='text-align:right'>   Status:</td><td>new</td></tr>

<tr><td style='text-align:right'>Milestone:</td><td></td></tr>

<tr><td style='text-align:right'>  Version:</td><td>development version</td></tr>

<tr><td style='text-align:right'>     Type:</td><td>enhancement</td></tr>

<tr><td style='text-align:right'> Priority:</td><td>minor</td></tr>

<tr><td style='text-align:right'>Component:</td><td></td></tr>

</table>


<p>Comment (by Wolfgang Kastaun):</p>

<p>The proposed solution for CarpetX would probably simplify the logic which gathers information on the available data a lot. I’m not entirely sure about speed. The reason parsing the hdf5 files takes so long seems a design flaw that requires basically to read the whole file just to get the names of all datasets, combined with the unfortunate choice of having the table of content information only in those names. I also did not get how the solution would look like for runs using many nodes, would there be one file for each node or even MPI process, or is the information collected first on one node? </p>

<p>But in principle, reading a few thousand yaml files should not take that long, we would have to try.</p>

<p>--<br/>

Ticket URL: <a href='https://bitbucket.org/einsteintoolkit/tickets/issues/2543/consolidate-data-formats-to-simplify'>https://bitbucket.org/einsteintoolkit/tickets/issues/2543/consolidate-data-formats-to-simplify</a></p>

</html>