[ET Trac] [Einstein Toolkit] #1796: Include IllinoisGRMHD into the Toolkit, as an Arrangement

Fri Oct 9 19:59:29 CDT 2015

#1796: Include IllinoisGRMHD into the Toolkit, as an Arrangement
------------------------------------+---------------------------------------
  Reporter:  zachetie@…             |       Owner:  Zachariah Etienne  
      Type:  enhancement            |      Status:  new                
  Priority:  major                  |   Milestone:  ET_2015_11         
 Component:  EinsteinToolkit thorn  |     Version:  development version
Resolution:                         |    Keywords:  GRMHD IllinoisGRMHD
------------------------------------+---------------------------------------

Comment (by Zach Etienne):

 Replying to [comment:18 hinder]:
 > Replying to [comment:14 Zach Etienne]:
 > > My understanding, based on Ian Hinder's message to the [Users] list
 (Subject: Test case notes), was that tests cannot be long in duration:
 > >
 > > "Tests should run as quickly as possible; ideally in less than one
 second.  Longer tests are OK if this is absolutely necessary to test for
 regressions."
 > >
 > > I interpret this to mean that my usual tests for correctness, which
 might require evolutions of order an hour long, running in parallel across
 machines, are disallowed in ET.

 > The point is that the tests in the thorns' 'test' directory are intended
 to be tests which can be run frequently and easily.  They are run roughly
 after every commit on some fairly small virtual machines, and people are
 encouraged to run the tests after checking out the ET, to make sure that
 everything is currently working on their machine.  Thus, they should be
 fast to run.  Usually this means regression tests with small amounts of
 output.

 My correctness tests require very little output (for TOV stars, just the
 maximum density plus the L2 norm of density versus time suffices). The
 issue is, they require orders of magnitude more resources than a few
 seconds of testing on a server.

 > In addition, a code should have correctness tests, and these may need to
 be run on a cluster, consist of multiple simulations (resolutions), run
 for longer, produce more output, etc.  The existing Cactus test mechanism
 is not suitable for this, and there is no alternative framework.  I have
 argued in the past for the need for such a framework, and of course nobody
 disagreed, but it is a question of resources to develop such a framework.

 Great to hear! I would argue that resource requirements depend most
 sensitively on the timescale of development; if major patches occur on a
 weekly timescale, then as long as complete, continuous correctness tests
 take ~ a day or two, we should be fine. For IllinoisGRMHD, a souped-up
 laptop or desktop collecting dust could do all the correctness testing
 necessary, completed within hours. Then it'd just be a matter of
 automatically generating roundoff error build-up comparison plots (a la
 Fig. 1 of the IllinoisGRMHD paper, http://arxiv.org/pdf/1501.07276.pdf)
 for a couple of quantities of physical interest, and automatically
 uploading these to a web server. I already have the basic components of
 infrastructure in place here at WVU; I'd just need to write the scripts
 and have them run as a cronjob.

 > If you have existing correctness tests, then it would be really good if
 you could include instructions for running them, e.g. in the README or
 documentation.  If they need access to large test data, I would be
 reluctant to include it in the thorn directory. Maybe a separate
 repository somewhere?

 The easiest correctness test is to just run the parfile for more
 iterations and compare the roundoff-error build-up in maximum density with
 the trusted code. It is remarkable how sensitive this quantity is to any
 small, beyond-roundoff-level difference. I'll write the instructions for
 doing so in the README, as well as a small table of expected results.
 Thanks for the tip!

 > I would be interested also in discussions about how a correctness-
 testing framework would look.  It would need to be able to submit and
 collect data from simulations on clusters, hence simfactory.  It would
 also need to be able to analyse the data, e.g. collecting data from
 multiple processes as you normally do when analysing your data.  i.e. it
 would need an analysis framework (https://docs.einsteintoolkit.org/et-docs
 /Analysis_and_post-processing).  It would also need IT infrastructure
 similar to a jenkins server to automate running testing jobs, providing
 interfaces for people to see what failed, look at results, etc.  It's
 nontrivial, which is probably why it doesn't already exist.

 Such an automated framework indeed sounds nontrivial, requiring a lot of
 human time to complete. I would suggest starting with something quick and
 dirty, like writing scripts that simply generate plots (like Fig 1 of
 http://arxiv.org/pdf/1501.07276.pdf) to a web page automatically so that
 within a second a human could recognize *by eye* whether the correctness
 was violated. This idea is unlikely to scale-up to many thorns, but I
 think it will work well for WVUThorns at least, and I'm happy to share my
 codes for doing so, in case they are useful.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1796#comment:19>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit