[ET Trac] [Einstein Toolkit] #1832: Rewrite git handling in Formaline

Thu Nov 26 02:09:04 CST 2015

#1832: Rewrite git handling in Formaline
------------------------------------+---------------------------------------
  Reporter:  eschnett               |       Owner:                     
      Type:  defect                 |      Status:  closed             
  Priority:  minor                  |   Milestone:                     
 Component:  EinsteinToolkit thorn  |     Version:  development version
Resolution:  fixed                  |    Keywords:                     
------------------------------------+---------------------------------------

Comment (by hinder):

 Replying to [comment:4 anonymous]:

 > - Yes, it is necessary to do such a size check. "git gc" will perform a
 garbage collection every time you run it. It also isn't run automatically.

 See the documentation for the --auto flag at https://git-scm.com/docs/git-
 gc, as well as the gc.auto configuration variable.  My point is that git
 has facilities for determining whether gc is necessary or not.  I think it
 would be good to use these facilities, unless they are found to have poor
 performance for the usage pattern here, in which case we should implement
 it ourselves.

 > - The individual steps are separated by blank lines and described via a
 brief comment. Each step uses almost all of the "configuration variables"
 that are passed in to the script, namely ($git_cmd, $git_repo,
 $git_master_repo, $git_local_repo, $git_central_repo, $git_root,
 $build_id, $config_id). Creating subroutines and passing these variables
 probably wouldn't help with clarity.

 The main reason I dislike such a pattern is that it is very hard to reason
 about locality of effect; if I set a variable in such a block, it might
 have an effect several pages down.  This can be helped by introducing
 local variable blocks, but typically this is not how this style is used.
 Variables that are "common" to each chunk usually are declared at the
 start of the function and then "reused".  Since the interface to the block
 of code is not enforced (e.g. "this block uses ($git_cmd, $git_repo,
 $git_master_repo, $git_local_repo, $git_central_repo, $git_root,
 $build_id, $config_id).", it is not clear immediately to the reader that
 this is the case.  And it is likely to lead to some exceptions.  Also, I
 think this style encourages copy and paste syndrome.  Since a block may be
 slightly coupled to its environment, and not in a clear way via a well-
 defined interface, it is hard to then extract such a block of code for re-
 use elsewhere, so the code is copied and pasted to the new location.  It
 is also impossible to see at a glance whether any of these variables are
 modified by any of these chunks; one has to read through the entire
 function.

 In this case, I think a sequence of function calls that passed in a dict
 (or hash, in perl?) of the common arguments would be a lot clearer.

 Looking back at the code now (of course after writing the above!), I see
 that in this case, the main function is only a couple of pages long, so
 it's maybe borderline.  It's still a bit long for my taste; I prefer a
 function to fit "on a screen", so I can see the input, the output, and
 what is done, all at the same time.  I just wanted to point out the above
 reasons for my preference for modular code.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1832#comment:6>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit