[Users] Stampede

Ian Hinder ian.hinder at aei.mpg.de
Fri May 2 09:07:59 CDT 2014


On 02 May 2014, at 14:08, Yosef Zlochower <yosef at astro.rit.edu> wrote:

> Hi
> 
> I have been having problems running on Stampede for a long time. I couldn't get the latest
> stable ET to run because during checkpointing, it would die.

OK that's very interesting.  Has something changed in the code related to how checkpoint files are written?

> I had to backtrack to 
> the Orsted version (unfortunately, that has a bug in the way the grid is set up, causing some of the
> intermediate levels to span both black holes, wasting a lot of memory).

That bug should have been fixed in a backport; are you sure you are checking out the branch and not the tag?  In any case, it can be worked around by setting CarpetRegrid2::min_fraction = 1, assuming this is the same bug I am thinking of (http://cactuscode.org/pipermail/users/2013-January/003290.html)

> Even with
> Orsted , stalling is a real issue. Currently, my "solution" is to run for 4 hours at a time.
> This would have been  OK on Lonestar or Ranger,
>  because when I chained a bunch a runs, the next in line would start
> almost right away, but on stampede the delay is quite substantial. I believe Jim Healy opened
> a ticket concerning the RIT issues with running ET on stampede.

I think this is the ticket: https://trac.einsteintoolkit.org/ticket/1547.  I will add my information there.  The current queue wait time on stampede is more than a day, so splitting into 3 hour chunks is not feasible, as you say.

I'm starting to think it might be a code problem as well.  So the summary is:

	– Checkpointing causes jobs to die with code versions after Oersted
	– All versions lead to eventual hung jobs after a few hours

Since Stampede is the major "capability" resource in Xsede, we should put some effort into making sure the ET can run properly there.
-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20140502/632c9938/attachment.html 


More information about the Users mailing list