[ET Trac] [Einstein Toolkit] #1772: Simfactory: potentially serious problem with CACHE directory in the simulations directory

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue May 5 10:21:21 CDT 2015


#1772: Simfactory: potentially serious problem with CACHE directory in the
simulations directory
------------------------+---------------------------------------------------
 Reporter:  bmundim     |       Owner:                     
     Type:  defect      |      Status:  new                
 Priority:  critical    |   Milestone:  ET_2015_05         
Component:  SimFactory  |     Version:  development version
 Keywords:  CACHE       |  
------------------------+---------------------------------------------------
 Is the directory CACHE in the simulation directory really necessary? We
 are talking about executables with at most 400MB of size, which is nothing
 compared to current HPC storage systems.

 I think I might have found a design flaw on simfactory use of CACHE
 directory which can go unnoticed until it is too late with potential loss
 of thousands of SUs. Suppose we have the following situation:

 1) We build a configuration A and send a simulation A1 with with parameter
 file 1. So simfactory copies the executable from configuration A to
 simulation A1 simfactory directory and creates a symlink from
 /scratch/simulations/CACHE/exe/cactus_A to
 /scratch/simulations/A1/SIMFACTORY/exe/cactus_A.

 2) We then create a new simulation A2 with a different parameter file 2.
 This time simfactory symlink the simulation executable
 /scratch/simulations/A2/SIMFACTORY/exe/cactus_A to the cached one
 /scratch/simulations/CACHE/exe/cactus_A.

 3) After a few days (or restarts) of simulations A1 and A2, you come up
 with a better idea/fix/new parameter which requires to recompile your
 configuration A. Note that we don't want to build a new configuration from
 scratch since cactus configurations consume both a lot of time and space
 to build. So you rebuild your configuration A and its executable cactus_A
 is updated.

 4) Let's say now we submit the updated configuration with the same
 parameter file 2 in order to test your new idea/fix/parameter and compare
 it with the simulation A2, which is still running and have a few extra
 restarts to completion. Call this simulation A2_updated. Simfactory then
 copy the new updated executable cactus_A from the Cactus/exe/cactus_A to
 the simulation directory
 /scratch/simulations/A2_updated/SIMFACTORY/exe/cactus_A  *and* update the
 CACHE symlink to that new simulation directory, ie:

 $ cd /scratch/simulations/CACHE/exe
 $ ls -l cactus_A
 cactus_A ->
 ../../../../scratch/simulations/A2_updated/SIMFACTORY/exe/cactus_A

 5) The problem: now my simulation A2 restarts are compromised with a new
 executable. Remember that that simulation executable is actually a symlink
 to the one in the CACHE directory, which has just been updated.

 I think this whole cache directory intermediate step introduces
 unnecessary complexity for the user to track; it is really unnecessary and
 in my opinion not a good design choice. I would vote to eliminate it from
 simfactory completely as soon as possible, ideally even for this release.
 Just use one copy of the executable from cactus/exe to
 simulation/SIMFACTORY/exe and that's it. This is all we need to have that
 simulation and future ones running consistently with the same executable.

 Thanks!


 PS: I have actually noticed this issue on Hershel release (there is no
 option pointing to Hershel release on trac). I am working on tests for
 development version to confirm this issue, but give simfactory commits I
 believe it is still there.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1772>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list