[ET Trac] #2543: Consolidate data formats to simplify postprocessing

Gabriele Bozzola trac-noreply at einsteintoolkit.org
Tue Aug 3 12:44:35 CDT 2021


#2543: Consolidate data formats to simplify postprocessing

 Reporter: Wolfgang Kastaun
   Status: new
Milestone: 
  Version: development version
     Type: enhancement
 Priority: minor
Component: 

Comment (by Gabriele Bozzola):

I can start giving you some first comments, but one would have to start thinking about the design of the postprocessing tool to make more serious comments. 

One quick comment is that the file doesn’t tell me all how to read a variable. Suppose I want to read `max(hydrogpu::rho)`, what column is it in the tsv file? If I understand how the tsv is structured \(it contains many reductions\), we need to enforce certain constrains to make sure that we can determine the column number without parsing the tsv file. For example, the order of variables and reductions in the yaml file must be the same as in the tsv file and all the variables must appear in all the reductions.

A second comment is: on some shared filesystems, opening files can be extremely expensive, so having fewer big files is much better than having many small ones. An example of this is the how Einstein Toolkit outputs multipoles now, which can lead to thousands of ASCII files, or one HDF5 file \(and the performance difference is really important\). If each the data for each iteration is stored in different files, I worry that it might lead to performance problems.

Also, why did you pick yaml over the \(faster but less powerful\) json?

--
Ticket URL: https://bitbucket.org/einsteintoolkit/tickets/issues/2543/consolidate-data-formats-to-simplify
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/trac/attachments/20210803/786493ea/attachment.html 


More information about the Trac mailing list