[ET Trac] [Einstein Toolkit] #814: Allow different timers on different processors in TimerReport
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Tue May 15 19:58:50 CDT 2012
#814: Allow different timers on different processors in TimerReport
------------------------------------+---------------------------------------
Reporter: eschnett | Owner:
Type: defect | Status: reopened
Priority: major | Milestone:
Component: EinsteinToolkit thorn | Version:
Resolution: | Keywords: TimerReport
------------------------------------+---------------------------------------
Changes (by rhaas):
* keywords: => TimerReport
* status: closed => reopened
* resolution: fixed =>
Comment:
This fails for me when there are timers that were destroyed (via
CCTK_TimerDestroy by WaveExtractL from Llama) with a segfault in line 677
(the strncpy). The error is assuming that timer never go away. Apparently
Cactus never reduces its number of timer counter though the actual timer
index seems to be re-used by the underlying
Util_DeleteHandle/Util_NewHandle. These two will free the associated timer
structure (and the name). So CCTK_NumTimers() seems to be the number of
times CCTK_TimerCreate was successfully called ie. the number of timers
ever created, not the number of timers currently in existence and one
always has to check if a timer id in a
{{{for(i=0;i<CCTK_NumTimers();i++)}}} loop is still valid.
Additionally it seems as if the large automatic arrays in line 700
{{{
char all_timernames[total_ntimers][TIMERNAME_LENGTH];
}}}
which tries to create an automatic array of size
CCTK_nProc()*CCTK_NumTimer()*TIMERNAME_LENGTH is also causing problems
(since I have a test runs without a call to TimerDestroy which fails in
line 704 "name_displacements[0] = 0" with a segfault.)
In my runs I have nprocs >= 192 (576 cores, 6 threads), NumTimers >= 2500
(hydro, multipatch) and TIMERNAME_LENGTH = 101. The total size of this
array is thus about 48MB. If this is ever tried to be allocated on the
stack by a simple minded compiler then this should segfault as well (and I
have no idea if a compiler is required to support arbitrarily large
automatic arrays).
If this ever causes a segfault then a better solution might be to rewrite
the CollectTimerInfo routine in question in C++ and use vector<T> for
memory allocation since there are still a number of arrays that have
automatic storage class and whose size depends on nProcs so can also grow
uncontrolled.
Attached is a patch for both issues.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/814#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list