[Users] ExternalLibraries without tarballs

Ian Hinder ian.hinder at aei.mpg.de
Sun Aug 11 09:59:47 CDT 2013


On 9 Aug 2013, at 02:46, Frank Loeffler <knarf at cct.lsu.edu> wrote:

> Hi,
> 
> On Thu, Aug 08, 2013 at 08:34:09PM -0400, Erik Schnetter wrote:
>> ExternalLibraries thorns containing tarballs can be very large, in
>> particular if the history of the thorn grows and contains multiple
>> tarballs.
> 
> It only really grows (for the user) if a revision control system is used
> that stores all of the history on the client side - something a user
> wouldn't be interested in usually, for external libraries.
> git is one of
> these. A counter example is Subversion, which wouldn't have that
> problem, and all external libraries are currently stored in Subversion.

Taking a set of source trees which are very similar and compressing each of them into a tarball makes it impossible to efficiently delta-compress them.  This is what SVN does in its repository.  While most users probably don't care about what happens hidden inside the server-hosted SVN repository, the mere fact that there is this duplication suggests that it is an inelegant, and hence probably wrong, solution to the problem.  Note that users will have to check out the whole new version of the library each time it changes, rather than just downloading the differences. This also applies to syncing to remote machines.  So users will see a performance hit with the current SVN approach anyway.

>> Other advantages of storing source trees are:
>> - this is what we do for every thorn, so why not for ExternalLibraries (d'oh)
> 
> External libraries are not like other thorns. Even when present in a
> thornlist, the source of external libraries might not be used in the
> end if a system versions is detected and usable, but it needs to be
> present.
> Decompressing all external libraries would lead to even more
> small files, which especially on clusters could get you into trouble.
> Of course you need to have the inodes when you compile it anyway, but
> again: you might not need to (compile it), and most of these are deleted
> again after the build.

This is a valid concern.  Erik, with your testing of boost, could you measure the number of files in the working tree, and the number of files in the built tree, for just the boost library?  I imagine that the number of files in the built tree is a factor of a few larger than the number in the source tree.  However, as Frank said, if the user is not building this library, then they do have a much larger inode cost if we store the source tree.  We discusses skipping syncing of certain external libraries before; maybe that is a better solution here.

Note that I am already opposed to using a large library such as Boost in the ET.  It requires a GB of space to uncompress, and more to build, for little benefit that I can see (though I have not looked).  If Boost is licensed appropriately and written in a modular-enough fashion, maybe it is possible to extract just the bits that people find useful? I'm sure we are pulling in a huge amount of code that we don't need by including Boost.

>> - one can easily look at the source code of an external library
>> - no need to untar and later delete the source tree while building
> 
> For a regular user this happens automagically, but I agree that this
> would be a plus for the maintainer of that thorn.
> 
>> - no need to apply patches -- the changes can be made directly in the source tree
> 
> I don't think this would be good. I would like to see which changes the
> Cactus thorn did compared to the vanilla version.

I assume Erik planned to keep the original version on a branch, and have an ET branch with our changes.

> Also, from a
> developers standpoint maintaining patches is far easier for upgrading
> the library than recreating and applying patches every time for an
> upgrade by hand. I strongly urge everyone to keep the vanilla version
> clearly separated from Cactus-specific changes.

A developer would not have to work with patches.  You would commit your changes on the ET branch of the library, and use the version control tool to see differences, merge in the new version, etc.  This is much better than messing about with patches, which are just a hack used in the absence of proper version control.

> 
>> In my opinion, this is the way to go. I am currently setting up a new
>> repository of ExternalLibraries/Boost in this way. (Obviously, it is
>> not possible to keep the current repository, since its history is
>> already at 406 MB.)
> 
> I don't really see a problem with the current setup.
> Even the boost svn
> checkout shouldn't be that large. The problem you describe really only
> applies for a git repository. We don't need to use git. We even don't
> use git for the external libraries.


There are problems with the current approach when using SVN:

* Updating an external library requires the whole library to be downloaded, rather than just what has changed
* Syncing to a remote cluster requires the whole library to be sent, rather than what has changed (often on a residential internet connection with limited upstream bandwidth)

Most of the active ET developers prefer to work with git anyway, and since git stores all the history locally, the problem is worse for us.  Note that the boost external library thorn is in git, which is why Erik noticed this problem in the first place.

-- 
Ian Hinder
http://numrel.aei.mpg.de/people/hinder

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.einsteintoolkit.org/pipermail/users/attachments/20130811/606aef16/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 203 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.einsteintoolkit.org/pipermail/users/attachments/20130811/606aef16/attachment.bin 


More information about the Users mailing list