[ET Trac] [Einstein Toolkit] #1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate jobs that hang

Einstein Toolkit trac-noreply at einsteintoolkit.org
Mon Mar 9 16:21:39 CDT 2015


#1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate
jobs that hang
------------------------------------+---------------------------------------
  Reporter:  dradice@…              |       Owner:  dradice@…          
      Type:  enhancement            |      Status:  reviewed_ok        
  Priority:  optional               |   Milestone:                     
 Component:  EinsteinToolkit thorn  |     Version:  development version
Resolution:                         |    Keywords:                     
------------------------------------+---------------------------------------

Comment (by dradice@…):

 On a second thought I am starting to think that supporting both the
 HEARTBEAT file and the pthread internal results in a lot of duplication.
 The HEARTBEAT file approach has the important drawback that it would work
 in a very non-uniform way across separate machines and that it requires
 the user to simultaneously adjust his/her runscript and parfile to adjust
 the timers in the checking code. I would probably stick with the current
 version of WatchDog: the only drawback is that it could leave zombies on
 some systems that do not cleanup after a job has terminated.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1751#comment:9>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list