[ET Trac] [Einstein Toolkit] #1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate jobs that hang

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue Mar 3 11:07:19 CST 2015


#1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate
jobs that hang
------------------------------------+---------------------------------------
  Reporter:  dradice@…              |       Owner:  dradice@…          
      Type:  enhancement            |      Status:  assigned           
  Priority:  optional               |   Milestone:                     
 Component:  EinsteinToolkit thorn  |     Version:  development version
Resolution:                         |    Keywords:                     
------------------------------------+---------------------------------------
Changes (by rhaas):

  * owner:  => dradice@…
  * priority:  unset => optional
  * status:  new => assigned


Comment:

 I had a look through the code. At this point, the thorn is not yet ready
 to be included in the ET or Cactus, however it seems worthwhile to have
 such a thorn. Comments from most severe to least severe:

 * the thorn needs documentation in the standard documentation.tex file,
 this can be very short, I would think that the description in the pull
 request plus the usual boilerplate text will be sufficient
 * the thorn has no test case however since it would have to test for an
 abort, a test case may be hard to design
 * the thorn uses fprintf(stderr, ...) for both error and informational
 messages. For informational messages (the "Everything is fine" message) it
 could use CCTK_VInfo() in the main thread's ANALYSIS routine. The warnings
 cannot be changed since they are emitted by the secondary thread and
 Cactus is not thread safe.
 * reading the man-page for asctime_r I do not think that the explicit zero
 termination in line 32 and 49 is required since asctime null terminates
 its output (which is also guaranteed to fit within 26 characters).
 * if possible, the thorn should check for the presence of PTHREADS,
 currently since PTHREADS is a Cactus extras, the only way to do so seems
 to be at compile time via:
 {{{
 #ifndef(CCTK_PTHREADS)
 #error "WATCHDOG required PTHREADS. Please enable PTHREADS=yes in your
 option list."
 #endif
 }}}
 * it may be interesting to make check_every STEERABLE=ALWAYS by resetting
 it inside the ANALYSIS routine (and protecting access to it by the mutex).

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1751#comment:2>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list