[ET Trac] [Einstein Toolkit] #764: cached output in CarpetIOSCalar

Sat Mar 3 08:01:41 CST 2012

#764: cached output in CarpetIOSCalar
--------------------------+-------------------------------------------------
  Reporter:  rhaas        |       Owner:  eschnett
      Type:  enhancement  |      Status:  review  
  Priority:  minor        |   Milestone:          
 Component:  Carpet       |     Version:          
Resolution:               |    Keywords:          
--------------------------+-------------------------------------------------

Comment (by eschnett):

 The C++ fstream object performs its own buffering. By closing and re-
 opening the files every time, we currently ensure that all information is
 written to disk, in case the simulation is crashes, is aborted, or runs
 out of queue time.

 I believe the main feature of your patch is to delay writing things to
 file, without flushing in between, which will indeed speed up things
 (which is presumably the intent of your patch). This could, in principle,
 also be achieved by keeping the output files open at all times, and by not
 flushing them.

 There is another serious performance problem in the current code that your
 patch doesn't address: there are too many files created. Creating many
 small files will always be inefficient. Writing to too many files at once
 may also make it problematic to keep these files open at all times,
 because the OS imposes a limit on this number.

 I think the true remedy is to write fewer, larger files. For example, all
 norms (for a variable or group, or for all variables) could be written to
 a single file. This would be faster during output, and would also improve
 file system speed when looking at the simulation later. We can then
 provide a simple awk script that re-creates the current files from this
 single file. That means that, in effect, writing to a single file is like
 caching the output, only that we then cache it in a file, which is safer
 if a simulation aborts.

 Instead of your approach, I would prefer a solution that would lead to
 fewer, larger files.

 Regarding your patch: I don't like the reference counting. It seems that
 this is only necessary because you place cached streams directly into STL
 containers, which need a copy constructor. You can instead place pointers
 to cache_streams into the STL container, and then there will be only a
 single copy of each cached stream. Alternatively, the STL provide some
 abstractions for this case, e.g. auto_ptr.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/764#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit