[ET Trac] [Einstein Toolkit] #1743: Reduce number of output files per directory

Fri Feb 13 10:44:55 CST 2015

#1743: Reduce number of output files per directory
-----------------------+----------------------------------------------------
  Reporter:  eschnett  |       Owner:                     
      Type:  defect    |      Status:  review             
  Priority:  unset     |   Milestone:                     
 Component:  Other     |     Version:  development version
Resolution:            |    Keywords:                     
-----------------------+----------------------------------------------------

Comment (by eschnett):

 Replying to [comment:14 knarf]:
 > Replying to [comment:13 eschnett]:
 > > Replying to [comment:12 knarf]:
 > > > > This is a special case, since {{{ilog(0)}}} is not defined.
 > > >
 > > > We could define it. The "Number of processes that can access the
 same directory" being 1 doesn't seem to be unreasonable to me.
 > >
 > > It's "log of the number of processes minus one", and "log of zero" is
 undefined. Running on one process is truly a special case.
 >
 > I meant 'we can (re)define what ilog(0) is, different from the
 definition for values >0. If I read the comment for the parameter
 correctly, restricting the number of processes that can access the same
 directory to 1 isn't something so special. Each process would write into
 it's own directory. That is different from running on one process.
 >
 > > The fact that you cannot have just one file per directory is
 different. The subdirectories form a tree; if each tree branch has just 1
 child, then the tree would be infinite. That's a limit on the use
 parameter {{{processes_per_directory}}}.
 >
 > I am afraid, but I think I don't understand the meaning of the
 parameter. I assumed: For a value of '1' I would expect each process to
 write into a different directory. For a value of '2' I would expect 2
 processes to share one directory, and so on. A value of '0' would be
 special, as all processes could write to the same directory in that case.
 Isn't that how I should interpret the parameter?

 We do not only create one level of subdirectories -- we create a tree of
 subdirectories, ensuring that each directory hold no more than N entries.
 That's why we need the log, and not just divide nprocs by the max number
 of files.

 You cannot have a tree with just one entry per branch. That would not be a
 tree. It's not possible to have just 1 entry per directory.

 If you let each process write to its own directory, then you need
 {{{nprocs}}} directories, instead of the {{{nprocs}}} files that you would
 normally have. You don't gain anything from this, the directories are
 still holding too many entries, and Lustre will still crash.

 > > > I do understand that a wrong usage of both strncat and strncpy can
 lead to problems much like the usage of strcat and strcpy. I don't
 understand why that should prevent us from using them correctly. Right
 now, with asserts possibly doing nothing, both strcat and strcpy could
 write into memory they shouldn't touch.
 > >
 > > I refuse to do that. I'll convert the file to C++ instead.
 >
 > That would be fine too, of course. string handling in C is painful even
 if done correctly.
 >
 > > Sure. I'm actually with you on this -- I don't like the style, and I'm
 serious about changing the style with an automated tool.
 >
 > I don't have enough experience with tools like this to have an informed
 opinion. I didn't quite like what a quick try using clang-format did to my
 source, but that is more my lack of configuration than anything else.

 I assume you didn't like it because you have a very firm opinion on how
 source code "should" be formatted. The problem is that, in a
 collaboration, everybody has his or her own firm opinion. What clang-
 format does is a reasnoable compromise, decided essentially by the LLVM
 developers who are expert software engineers. I stopped caring about the
 details of where braces go and how lines are broken, as long as I can read
 the code, and as long as the style is consistent within a group of source
 files. clang-format does something very reasonable, although it looks
 different from how I would format the code. But then, the way you write
 code looks different from what I would format it. So it doesn't matter.
 Trust clang-format, let it do its job, give it the benefit of the doubt
 for a few days, and then enjoy your newfound freedom from worrying about
 formatting.

 I stopped even hitting "enter" when I make a small change to code, let
 alone worry about spaces, or indentation. I just write the essential
 characters, and a few seconds later the code is nicely formatted.

-- 
Ticket URL: <https://trac.einsteintoolkit.org/ticket/1743#comment:15>
Einstein Toolkit <http://einsteintoolkit.org>
The Einstein Toolkit