[ET Trac] [Einstein Toolkit] #764: cached output in CarpetIOSCalar
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Sat Mar 3 08:01:41 CST 2012
#764: cached output in CarpetIOSCalar
--------------------------+-------------------------------------------------
Reporter: rhaas | Owner: eschnett
Type: enhancement | Status: review
Priority: minor | Milestone:
Component: Carpet | Version:
Resolution: | Keywords:
--------------------------+-------------------------------------------------
Comment (by eschnett):
The C++ fstream object performs its own buffering. By closing and re-
opening the files every time, we currently ensure that all information is
written to disk, in case the simulation is crashes, is aborted, or runs
out of queue time.
I believe the main feature of your patch is to delay writing things to
file, without flushing in between, which will indeed speed up things
(which is presumably the intent of your patch). This could, in principle,
also be achieved by keeping the output files open at all times, and by not
flushing them.
There is another serious performance problem in the current code that your
patch doesn't address: there are too many files created. Creating many
small files will always be inefficient. Writing to too many files at once
may also make it problematic to keep these files open at all times,
because the OS imposes a limit on this number.
I think the true remedy is to write fewer, larger files. For example, all
norms (for a variable or group, or for all variables) could be written to
a single file. This would be faster during output, and would also improve
file system speed when looking at the simulation later. We can then
provide a simple awk script that re-creates the current files from this
single file. That means that, in effect, writing to a single file is like
caching the output, only that we then cache it in a file, which is safer
if a simulation aborts.
Instead of your approach, I would prefer a solution that would lead to
fewer, larger files.
Regarding your patch: I don't like the reference counting. It seems that
this is only necessary because you place cached streams directly into STL
containers, which need a copy constructor. You can instead place pointers
to cache_streams into the STL container, and then there will be only a
single copy of each cached stream. Alternatively, the STL provide some
abstractions for this case, e.g. auto_ptr.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/764#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list