[ET Trac] [Einstein Toolkit] #1772: Simfactory: potentially serious problem with CACHE directory in the simulations directory
Einstein Toolkit
trac-noreply at einsteintoolkit.org
Tue May 5 18:06:13 CDT 2015
#1772: Simfactory: potentially serious problem with CACHE directory in the
simulations directory
-------------------------+--------------------------------------------------
Reporter: bmundim | Owner:
Type: defect | Status: new
Priority: critical | Milestone: ET_2015_05
Component: SimFactory | Version: development version
Resolution: | Keywords: CACHE
-------------------------+--------------------------------------------------
Comment (by bmundim):
Hi Erik,
thanks a lot for your clarification! I should have tested on other
clusters before filing this ticket. Note however that what I described
earlier does happen on Loewe! Please see my comments below:
Replying to [comment:1 eschnett]:
> Yes, the CACHE directory is necessary if you are running many similar
simulations. This can happen e.g. during benchmarking. While most HPC
systems can handle a large number of executables, there are some that
cannot, and where one runs out of quota.
>
Ok, point taken! The executable size might not be a problem but the total
number of them might take you out of your quota on some systems.
> The cache works slightly differently than you describe. First, it uses
hard links, not soft (symbolic) links.
That doesn't seem to happen on Loewe. I might be completely confused, but
as far as I understand all links created by simfactory for executables
held on the CACHE directory and on the simulation directories were
symbolic links. For example:
{{{
$ pwd
/scratch/astro/mundim/simulations/ET_2014_11_herschel/bns_thc/SIMFACTORY/exe
$ stat cactus_thc_i15_O2
File: `cactus_thc_i15_O2' ->
`../../../../../../../astro/mundim/simulations/ET_2014_11_herschel/CACHE/exe/cactus_thc_i15_O2'
Size: 93 Blocks: 1 IO Block: 524288 symbolic link
Device: 19h/25d Inode: 9962542217151410378 Links: 1
Access: (0777/lrwxrwxrwx) Uid: (58311/ mundim) Gid: (58057/ astro)
Access: 2015-04-21 22:59:48.000000000 +0200
Modify: 2015-04-21 22:59:48.000000000 +0200
Change: 2015-04-21 22:59:48.000000000 +0200
$ cd
../../../../../../../astro/mundim/simulations/ET_2014_11_herschel/CACHE/exe
$ stat cactus_thc_i15_O2
File: `cactus_thc_i15_O2' ->
`../../../../../../astro/mundim/simulations/ET_2014_11_herschel/bns_thc_new/SIMFACTORY/exe/cactus_thc_i15_O2'
Size: 119 Blocks: 1 IO Block: 524288 symbolic link
Device: 19h/25d Inode: 4055656628534771390 Links: 1
Access: (0777/lrwxrwxrwx) Uid: (58311/ mundim) Gid: (58057/ astro)
Access: 2015-05-05 15:49:48.000000000 +0200
Modify: 2015-05-05 15:49:48.000000000 +0200
Change: 2015-05-05 15:49:48.000000000 +0200
}}}
> The cache is just this, a cache -- the actual executables are safely
stored in the simulation directories, and are never modified. Here is what
actually happens when a simulation is created:
>
> 1. Check cache whether it has the right executable. If not, ignore
cache.
If not, then create the cache, no? What you mean by right executable is if
the current cache and Cactus/exe/cactus_sim, for example, are the same
executable, right?
> 2. If cache is good, create a hard link from cache to simulation
directory.
What if it is creating a symbolic link silently as it seems to happen on
Loewe? Is there a way of testing if actually a hard link was created?
> 3. Check simulation directory if it now has the right executable, since
the cache may have changed in the mean time. If no, delete hard link
again.
> 4. If simulation directory does not have an executable, copy it from the
Cactus build directory.
Unfortunately this is not happening on Loewe. The cache has changed, but
the restart of my old simulation still has a symbolic link to the cache
directory which now symlinks to the newer simulation with an updated copy
of the cactus executable.
> 5. The simulation directory now has the correct executable.
I agree with your explanation, but something is going wrong on loewe (or
on my head for not noticing something obvious you stated)
> 6. If we are not using a hard link from the cache, then delete the cache
file, and create a new hard link from the simulation directory to the
cache.
>
> This guarantees that simulations always use the correct executable, and
that a simulation's executable is never overwritten. Also, there are
fallbacks in place in case any of the operations fails (e.g. creating a
hard link).
>
Ok, what is the fallback for not creating hard links?
> What is actually a problem currently is that some external libraries
that are built are dynamic libraries, which can change or go away when one
rebuilds. These are not copied into the simulation directory. We try to
enforce using static libraries for this reason, but I'm not sure whether
this is the case for all external libraries.
--
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1772#comment:3>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit
More information about the Trac
mailing list