Independent code in performance optimizations

Reading time ~1 minute

This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:

The fastest code is the one that doesn’t execute. Second to that is the one that executes once

Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.

Let’s look once more at the code:

public float Similarity(IDictionary<int,float> otherVector)
{
    var num = this
        .Sum(ingredient =>
        {
            float value = 0;
            if (otherVector.TryGetValue(ingredient.Key, out value))
                return ingredient.Value*value;
            return 0;
        });

    var denom = Lengh()*otherVector.Length();
    return num/denom;
}

and Length looking like this:

public float Length(Vector a){    
    return (float)Math.Sqrt(a.Sum(value => value*value));
}

Similarity will be called for every pair of recipes and that means a lot:

(182184*182184)/2 = 16 595 504 928 times 

But there is one fragment that doesn’t change depending on input parameters, so it means i can calculate it less often.
Can You spot it?

It’s Length.

Changing it’s code to this:

public float Length()
{
    if (_wasIngredientWeightsChanged)
    {
        _len = (float) Math.Sqrt(_ingredientWeightsInternal.Values.Sum(value => value*value));
        _wasIngredientWeightsChanged = false;
    }
    return _len;
}

Allowed me to go from 968 seconds to 745 seconds for the sample. Scaling it to all recipes gives:

(745 / 2199) * 182184 ~ 17 hours (starting from 34 hours)

Conslusion

“Can I make it faster” is not the only question to ask when optimizing. Sometimes asking “does it have to execute this often?” is also a very valid question.

Hi, I'm Szymon Warda, I write this blog, tweet and speak at conferences. If You want to know more go here, or follow me:

What is the problem with key-value databases and how wide column stores solve it.

Key-value databases are very powerful, but there are use cases where its value as a blob gets in the way. Continue reading