[ET Trac] [Einstein Toolkit] #349: pyc files when syncing

Sun Jun 19 06:13:32 CDT 2011

#349: pyc files when syncing
----------------------------+-----------------------------------------------
  Reporter:  barry.wardell  |       Owner:  mthomas
      Type:  defect         |      Status:  new    
  Priority:  major          |   Milestone:         
 Component:  SimFactory     |     Version:         
Resolution:                 |    Keywords:         
----------------------------+-----------------------------------------------

Comment (by barry.wardell):

 Replying to [comment:12 eschnett]:
 > You are introducing a new "paths" syntax to the sync command. I thought
 previously that one could just list multiple machines, and simfactory
 would sync to all of them -- apparently that isn't the case, that was lost
 in translation.

 Yes, sorry for the confusion. This patch introduces three changes
 (switching to filter rules system, changing the behavior of sim sync with
 multiple arguments and removing the --sync-parfiles and --sync-sourcetree
 options) which should ideally be separated into separate issues for
 consideration. The reason I didn't do so was that the the three changes
 naturally came at the same time in terms of the changes to the code. The
 last two can be restored to their original behavior if desired. However, I
 actually prefer the new behavior because:

 * I am much more likely to want to sync specific paths than to sync to
 multiple machines at once.
 * The paths system provides more flexibility and control than the --sync-
 parfiles and --sync-sourcetree options did and makes them somewhat
 unnecessary. This flexibility is particularly useful on machines with
 slower filesystems where only syncing a specific path can save a lot of
 time.

 What are other people's opinions on this?

 >However, it seems that filter.cactus.rules contains only a list of top-
 level paths, and isn't supposed to contain any actual rules -- if it did
 contain rules, then the result would be confusing, because "sim sync" and
 "sim sync paths" would copy and/or delete different sets of files. Also,
 people may want to change this default list of paths, so there should
 probably also be a filter.cactus.local.rules... Should this list of paths
 instead be stored in an ini file, where there is already a mechanism to
 configure settings, and where simfactory could check that these are
 actually only path names and not accidentally patterns?

 The idea is that if specific paths are not given, then the file
 filter.cactus.rules is read in and gives a default list of paths to be
 included. I agree that this should not contain any actual rules for the
 reason you give and have added a comment to this effect to the top of
 filter.cactus.rules. Any filter rules to be applied to all paths should be
 put in filter.rules.

 If the user wants to modify this, they can add a .rsync.rules file in
 their Cactus base directory which is read in first and so will override
 anything in filter.cactus.rules. I don't really like storing these in ini
 files because that would be moving away from using rsync's filter rules
 system (unless simfactory parsed the ini file and generated the
 appropriate .rsync.rules file).

 > Shouldn't "_darcs" always be excluded, similar to CVS .svn .git .hg etc?

 Yes, "_darcs" is also excluded by a rule in the filter.rules file. I have
 also added .hg in the lastest patch.

 > What does "C" do? Does it read a .cvsignore file? If so, shouldn't this
 file be transferred as well, and be documented for simfactory? This would
 be one more configuration file for people to understand; can we ignore
 this file instead? cvs is not important any more these days.

 "C" is designed to exclude many common paths which you often don't want to
 transfer. These are:
 "RCS  SCCS  CVS  CVS.adm  RCSLOG  cvslog.*  tags TAGS .make.state
 .nse_depinfo *~ #* .#* ,* _$* *$ *.old *.bak *.BAK *.orig *.rej .del-* *.a
 *.olb *.o *.obj *.so *.exe *.Z *.elc *.ln core .svn/ .git/ .bzr/"
 It also appends any patterns listed in $HOME/.cvsignore and in directory-
 local .cvsignore files. I don't think we should worry too much about
 .cvsignore files though - I doubt many people have them any more.

 > How do you expect people to use the "paths" mechanism? Can one give just
 top-level paths, or also directly paths deep into the hierarchy? Would you
 expect to do this regularly? If so, why? I find this somewhat dangerous,
 because people may miss transferring an updated file. Instead of telling
 simfactory what to do, the user currently tells simfactory his/her intent,
 e.g. "copy source files" or "copy parameter files", which are
 prerequisites to either building or submitting. Simfactory then deals with
 the details, ensuring things are done in a safe way. Would you find it
 inconvenient if you had to use an option to specify a pathname, e.g. "sim
 sync damiana -p par"?

 The idea is that there are three modes of operation:
 * Without any paths specified we sync all paths given in
 filter.cactus.rules (and also include anything any modifications in the
 file .rsync.rules). This is essentially the same as what happened before.
 * With a list of paths given, only those paths are synchronized. Both
 filter.cactus.rules and $CACTUSDIR/.rsync.rules are ignored (but any
 .rsync.rules files in the specified paths are read). For consistency, only
 toplevel paths are accepted in this mode.
 * With a single path given, only this path is synchronized. Both
 filter.cactus.rules and $CACTUSDIR/.rsync.rules are ignored (but any
 .rsync.rules files in the specified paths are read). In this case, non-
 toplevel paths are allowed and handled appropriately.

 The main case where I would expect to use this regularly is when syncing
 to machine with a slow filesystem (eg. Kraken) where simply checking which
 files need to be synced can sometimes take a long time. In fact, before
 now I often used rsync manually instead of 'sim sync' when I was syncing
 small changes often (eg. when debugging a problem, setting up a new
 parameter file, etc.). I quite like how things work with this patch
 applied. We could add the --sync-sources and --sync-parfiles convenience
 options back, although I'm not sure if I would personally use them.

 What is the advantage of using an option to specify a pathname?

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/349#comment:13>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit