[Users] ExternalLibraries without tarballs

Frank Loeffler knarf at cct.lsu.edu
Sun Aug 11 20:10:05 CDT 2013


Hi,

On Sun, Aug 11, 2013 at 04:59:47PM +0200, Ian Hinder wrote:
> Taking a set of source trees which are very similar and compressing
> each of them into a tarball makes it impossible to efficiently
> delta-compress them.

That's not completely true. 'tar' isn't really the problem. The
compression we use is.

> This is what SVN does in its repository.  While most users probably
> don't care about what happens hidden inside the server-hosted SVN
> repository, the mere fact that there is this duplication suggests that
> it is an inelegant, and hence probably wrong, solution to the problem.

We knew that it is not the most elegant solution from the beginning.
However, it is the most convenient for a user.

> Note that users will have to check out the whole new version of the
> library each time it changes, rather than just downloading the
> differences. This also applies to syncing to remote machines.  So
> users will see a performance hit with the current SVN approach anyway.

Yes. But then we argued that the external libraries aren't going to
change so often anyway - and when you actually have an update, you are
probably connected to a high-speed network at work anyway.

> This is a valid concern.  Erik, with your testing of boost, could you
> measure the number of files in the working tree, and the number of
> files in the built tree, for just the boost library?  I imagine that
> the number of files in the built tree is a factor of a few larger than
> the number in the source tree.  However, as Frank said, if the user is
> not building this library, then they do have a much larger inode cost
> if we store the source tree.  We discusses skipping syncing of certain
> external libraries before; maybe that is a better solution here.

Also, the files in the build tree are deleted once the library is built,
leading to much lower overhead especially with a lot of configurations
that might need different built versions of the libraries.

> Note that I am already opposed to using a large library such as Boost
> in the ET.  It requires a GB of space to uncompress, and more to
> build, for little benefit that I can see (though I have not looked).

I didn't look as well, but hear almost every time how bad of a decision
some people think it was to have certain other projects relying so
heavily on boost.

> I assume Erik planned to keep the original version on a branch, and
> have an ET branch with our changes.

That would limit the problem somewhat. But still - as user I might be
interested to see which changes are necessary to make library X work
with the ET without digging into the documentation of a VCS to find the
patches with their descriptions - especially if they evolved over time
and I am not at all interested in that evolution, just the actual patch,
and reason.

> A developer would not have to work with patches.  You would commit
> your changes on the ET branch of the library, and use the version
> control tool to see differences, merge in the new version, etc.  This
> is much better than messing about with patches, which are just a hack
> used in the absence of proper version control.

When tracking an externally developed software I would always like to
minimize the changes I have - dropping patches with time if they are not
necessary anymore. I would like to keep patches apart that have distinct
purposes. I am usually not very much interested in how these patches
evolve in time. All I really care about is a minimal set of patches with
a distinct, stated, reason for the each upstream version.

Correct me if I am wrong, but what you describe is something different,
isn't it?

> * Updating an external library requires the whole library to be downloaded, rather than just what has changed
> * Syncing to a remote cluster requires the whole library to be sent, rather than what has changed (often on a residential internet connection with limited upstream bandwidth)

These two are really the same, single issue. Also - don't sync when you
are on low bandwidth. The same would apply for an initial clone of a
giant git repository, including history you don't care about.

Frank

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130811/31d12c3a/attachment.bin 


More information about the Users mailing list