[ET Trac] [Einstein Toolkit] #1772: Simfactory: potentially serious problem with CACHE directory in the simulations directory

Einstein Toolkit trac-noreply at einsteintoolkit.org
Tue May 5 18:06:13 CDT 2015


#1772: Simfactory: potentially serious problem with CACHE directory in the
simulations directory
-------------------------+--------------------------------------------------
  Reporter:  bmundim     |       Owner:                     
      Type:  defect      |      Status:  new                
  Priority:  critical    |   Milestone:  ET_2015_05         
 Component:  SimFactory  |     Version:  development version
Resolution:              |    Keywords:  CACHE              
-------------------------+--------------------------------------------------

Comment (by bmundim):

 Hi Erik,

 thanks a lot for your clarification! I should have tested on other
 clusters before filing this ticket. Note however that what I described
 earlier does happen on Loewe! Please see my comments below:


 Replying to [comment:1 eschnett]:
 > Yes, the CACHE directory is necessary if you are running many similar
 simulations. This can happen e.g. during benchmarking. While most HPC
 systems can handle a large number of executables, there are some that
 cannot, and where one runs out of quota.
 >

 Ok, point taken! The executable size might not be a problem but the total
 number of them might take you out of your quota on some systems.


 > The cache works slightly differently than you describe. First, it uses
 hard links, not soft (symbolic) links.

 That doesn't seem to happen on Loewe. I might be completely confused, but
 as far as I understand all links created by simfactory for executables
 held on the CACHE directory and on the simulation directories were
 symbolic links. For example:

 {{{
 $ pwd
 /scratch/astro/mundim/simulations/ET_2014_11_herschel/bns_thc/SIMFACTORY/exe
 $ stat cactus_thc_i15_O2
   File: `cactus_thc_i15_O2' ->
 `../../../../../../../astro/mundim/simulations/ET_2014_11_herschel/CACHE/exe/cactus_thc_i15_O2'
   Size: 93              Blocks: 1          IO Block: 524288 symbolic link
 Device: 19h/25d Inode: 9962542217151410378  Links: 1
 Access: (0777/lrwxrwxrwx)  Uid: (58311/  mundim)   Gid: (58057/   astro)
 Access: 2015-04-21 22:59:48.000000000 +0200
 Modify: 2015-04-21 22:59:48.000000000 +0200
 Change: 2015-04-21 22:59:48.000000000 +0200

 $ cd
 ../../../../../../../astro/mundim/simulations/ET_2014_11_herschel/CACHE/exe
 $ stat cactus_thc_i15_O2
   File: `cactus_thc_i15_O2' ->
 `../../../../../../astro/mundim/simulations/ET_2014_11_herschel/bns_thc_new/SIMFACTORY/exe/cactus_thc_i15_O2'
   Size: 119             Blocks: 1          IO Block: 524288 symbolic link
 Device: 19h/25d Inode: 4055656628534771390  Links: 1
 Access: (0777/lrwxrwxrwx)  Uid: (58311/  mundim)   Gid: (58057/   astro)
 Access: 2015-05-05 15:49:48.000000000 +0200
 Modify: 2015-05-05 15:49:48.000000000 +0200
 Change: 2015-05-05 15:49:48.000000000 +0200

 }}}

 > The cache is just this, a cache -- the actual executables are safely
 stored in the simulation directories, and are never modified. Here is what
 actually happens when a simulation is created:
 >
 > 1. Check cache whether it has the right executable. If not, ignore
 cache.

 If not, then create the cache, no? What you mean by right executable is if
 the current cache and Cactus/exe/cactus_sim, for example, are the same
 executable, right?

 > 2. If cache is good, create a hard link from cache to simulation
 directory.

 What if it is creating a symbolic link silently as it seems to happen on
 Loewe? Is there a way of testing if actually a hard link was created?

 > 3. Check simulation directory if it now has the right executable, since
 the cache may have changed in the mean time. If no, delete hard link
 again.
 > 4. If simulation directory does not have an executable, copy it from the
 Cactus build directory.

 Unfortunately this is not happening on Loewe. The cache has changed, but
 the restart of my old simulation still has a symbolic link to the cache
 directory which now symlinks to the newer simulation with an updated copy
 of the cactus executable.

 > 5. The simulation directory now has the correct executable.

 I agree with your explanation, but something is going wrong on loewe (or
 on my head for not noticing something obvious you stated)

 > 6. If we are not using a hard link from the cache, then delete the cache
 file, and create a new hard link from the simulation directory to the
 cache.
 >
 > This guarantees that simulations always use the correct executable, and
 that a simulation's executable is never overwritten. Also, there are
 fallbacks in place in case any of the operations fails (e.g. creating a
 hard link).
 >

 Ok, what is the fallback for not creating hard links?

 > What is actually a problem currently is that some external libraries
 that are built are dynamic libraries, which can change or go away when one
 rebuilds. These are not copied into the simulation directory. We try to
 enforce using static libraries for this reason, but I'm not sure whether
 this is the case for all external libraries.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1772#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit


More information about the Trac mailing list