[ET Trac] [Einstein Toolkit] #814: Allow different timers on different processors in TimerReport

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue May 15 19:58:50 CDT 2012


#814: Allow different timers on different processors in TimerReport
------------------------------------+---------------------------------------
  Reporter:  eschnett               |       Owner:             
      Type:  defect                 |      Status:  reopened   
  Priority:  major                  |   Milestone:             
 Component:  EinsteinToolkit thorn  |     Version:             
Resolution:                         |    Keywords:  TimerReport
------------------------------------+---------------------------------------
Changes (by rhaas):

  * keywords:  => TimerReport
  * status:  closed => reopened
  * resolution:  fixed =>


Comment:

 This fails for me when there are timers that were destroyed (via
 CCTK_TimerDestroy by WaveExtractL from Llama) with a segfault in line 677
 (the strncpy). The error is assuming that timer never go away. Apparently
 Cactus never reduces its number of timer counter though the actual timer
 index seems to be re-used by the underlying
 Util_DeleteHandle/Util_NewHandle. These two will free the associated timer
 structure (and the name). So CCTK_NumTimers() seems to be the number of
 times CCTK_TimerCreate was successfully called ie. the number of timers
 ever created, not the number of timers currently in existence and one
 always has to check if a timer id in a
 {{{for(i=0;i<CCTK_NumTimers();i++)}}} loop is still valid.

 Additionally it seems as if the large automatic arrays in line 700
 {{{
 char all_timernames[total_ntimers][TIMERNAME_LENGTH];
 }}}
 which tries to create an automatic array of size
 CCTK_nProc()*CCTK_NumTimer()*TIMERNAME_LENGTH is also causing problems
 (since I have a test runs without a call to TimerDestroy which fails in
 line 704 "name_displacements[0] = 0" with a segfault.)

 In my runs I have nprocs >= 192 (576 cores, 6 threads), NumTimers >= 2500
 (hydro, multipatch) and TIMERNAME_LENGTH = 101. The total size of this
 array is thus about 48MB. If this is ever tried to be allocated on the
 stack by a simple minded compiler then this should segfault as well (and I
 have no idea if a compiler is required to support arbitrarily large
 automatic arrays).

 If this ever causes a segfault then a better solution might be to rewrite
 the CollectTimerInfo routine in question in C++ and use vector<T> for
 memory allocation since there are still a number of arrays that have
 automatic storage class and whose size depends on nProcs so can also grow
 uncontrolled.

 Attached is a patch for both issues.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/814#comment:4>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list