fake commit parent

Devoted some time today to figure out how to add a "fake" parent to a commit in git. It's quite easy once you discover there is something called grafts in git.

A graft can be created be adding a line to .git/info/grafts which is simply a list of commit-shas. The first identifies the commit to modify and the following is what the parents of that commit should be. The following adds a second parent to the commit <commit> that currently have <parent1> as it single parent.

echo "<commit> <parent1> <parent2>" >> .git/info/grafts

After doing this all commands like git log and git blame will behave as if <parent2> was a parent of <commit>. However this change is only local all will not follow after a git push.

filter-branch to the rescue

git filter-branch --tag-name-filter cat -- origin/master..

dulwich again

A small update on the earlier dulwich post. Jelmer pushed a fix to the redundant parse issue and using this version the time consumed is roughly cut in half. Was really hoping for more as the tree should be parsed once instead of 3 times now. I still have to rule out the repository simply growing even more from the last post.


with regards to the tags of the last post. https://www.youtube.com/watch?v=asUyK6JWt9U

lstree -r in dulwich

Was writing some code that walked a tree in dulwich, but it was terribly slow. A regular git ls-tree -r on the same tree object however finished in under 100 ms.

The first awful lstree for dulwich

def lstree(repo, tree):
    from os.path import join
    queue = [('', tree)]

    while len(queue) > 0:
        base, tree = queue.pop()

        for entry in tree.iteritems():
            obj = repo[entry.sha]
            if isinstance(obj, Tree):
                queue.append((join(base, entry.path), obj))
            elif isinstance(obj, Blob):
                print(entry.sha, join(base, entry.path))

This version ran somewhere just short of 30s with a warm fs cache. Two take aways: something is seriously wrong and my git repo is quite the beast.

One thing the above code does is waste time reading objects that will never be used, like the blobs. Well, except for the isinstance check. A second version that avoids reading anything but trees cuts it down to 2-3 seconds.

def lstree(repo, sha):
    from os.path import join
    queue = [('', sha)]

    while len(queue) > 0:
        base, sha = queue.pop()
        tree = repo[sha]

        for entry in tree.iteritems():
            if entry.mode == 0o40000:
                queue.append((join(base, entry.path), entry.sha))
            elif entry.mode == 0o100644:
                print(entry.sha, join(base, entry.path))

With the glaring mistake out of the way it starts to get tricky to cut down the time consumption further. using tree._entries.iteritems() rather than tree.iteritems() shaves off another 100 ms but nothing exciting.

One tree stands out by taking about 70 ms alone to process. It's a folder with close to 20k subfolders. I narrow it down with some code to only read this tree and get about the same time measurement. 67 ms to load the tree and 34 more to iterate over it with a simple pass. A check with the profiler reveals that _deserialize of the Tree class stands for most of the time followed by parse_tree which is directly called by the first and oddly enough both have a call count of 3.

I will have to investigate further.

Chasing consistency

Stumbled upon statebox while looking for answer on how to manage a set of members (keys in other buckets) with regards to riak`s eventual consistency. I like the approach because it solves the resolution for the generic case rather than doing something custom for each possible conflict in the application. It does this by putting some restrictions on what operations you may do on the data, most set like operations are cool.

One of the core concepts is that every operation is idempotent, and this is where it breaks down for me as I explicitly don't want some operations to be repeatable. Removing a member from the set in my use case implies it will be added to a set stored at another key. If I could remove it twice it could be added to two different sets, and that can't happen.

I'm currently looking into doing a variation where instead of using only repeatable operations I only use reversible operations. E.g you can't add something to a set that is already in there as that operation would not be possible to reverse without knowing the previous state already.

Debug javascript on iphone

Was figuring out why a javascript widget was not working properly on iphones today at work. The widget is a kinda silly information box that display some content in a iframe and the issue turned out to be that you can't set the height of a iframe in mobile safari. It was solved by adding overflow: auto and a -webkit-overflow-scrolling all in all not very interesting.

What I did enjoy was the absolutely horrid hack I used to get my console logs out of the ipoon.

window.onerror = function (msg, url, line) {
    $.ajax('http://my-host:8000/error', {
        data: {
            'message': msg,
            'line': line,
            'url': url

console.log = function () {
    var message = Array.prototype.slice.call(arguments).join('');
    if (!message) {
    $.ajax('http://my-host:8000/log', {
        data: {
            'message': message,

And then I could see my console.log's and errors that occurred in the access log of the python3 -m http.server I was using to serve the page already. There's so many cool ways this could be improved and someone has probably already created a full library to do this. But I have a feeling I will leave this as a one of hack with the only trace being this here post.

instead of escaping find -exec

Found myself setting up quite a few symlinks in one of our repos at work today. It all followed a simple pattern where a current link was set up to point to to folders with a particular name. The link was to be setup in multiple places in the folder structure.

So, quickly started typing out a find -exec command but the escaping got really messy quickly. After some tinkering I ended up piping the find output to a while read loop that did the work. And I did not have to worry about escaping.

find -name needle | while read dir; do
  ( cd `dirname $dir`; ln -s needle current-special )

code snippets

Added a page for collecting small snippets of shell, code and similiar