[ET Trac] [Einstein Toolkit] #1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate jobs that hang
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Mon Mar 9 16:21:39 CDT 2015
#1751: [Pull request: CactusUtils/WatchDog] new thorn to automatically terminate
jobs that hang
------------------------------------+---------------------------------------
Reporter: dradice@… | Owner: dradice@…
Type: enhancement | Status: reviewed_ok
Priority: optional | Milestone:
Component: EinsteinToolkit thorn | Version: development version
Resolution: | Keywords:
------------------------------------+---------------------------------------
Comment (by dradice@…):
On a second thought I am starting to think that supporting both the
HEARTBEAT file and the pthread internal results in a lot of duplication.
The HEARTBEAT file approach has the important drawback that it would work
in a very non-uniform way across separate machines and that it requires
the user to simultaneously adjust his/her runscript and parfile to adjust
the timers in the checking code. I would probably stick with the current
version of WatchDog: the only drawback is that it could leave zombies on
some systems that do not cleanup after a job has terminated.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1751#comment:9>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list