This article is Part 3 in a 7-Part Series.
- Part 1 - How I calculate similarities in cookit?
- Part 2 - How to calculate 17 billion similarities
- Part 3 - This Article
- Part 4 - Using bit masks for high-performance calculations
- Part 5 - Making bits faster
- Part 6 - Dividing a bit in two for performance
- Part 7 - Understanding OutOfMemoryException
This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:
The fastest code is the one that doesn’t execute. Second to that is the one that executes once
Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.
Let’s look once more at the code:
public float Similarity(IDictionary<int,float> otherVector)
{
var num = this
.Sum(ingredient =>
{
float value = 0;
if (otherVector.TryGetValue(ingredient.Key, out value))
return ingredient.Value*value;
return 0;
});
var denom = Lengh()*otherVector.Length();
return num/denom;
}
and Length looking like this:
public float Length(Vector a){
return (float)Math.Sqrt(a.Sum(value => value*value));
}
Similarity
will be called for every pair of recipes and that means a lot:
(182184*182184)/2 = 16 595 504 928 times
But there is one fragment that doesn’t change depending on input parameters, so it means i can calculate it less often.
Can You spot it?
It’s Length.
Changing it’s code to this:
public float Length()
{
if (_wasIngredientWeightsChanged)
{
_len = (float) Math.Sqrt(_ingredientWeightsInternal.Values.Sum(value => value*value));
_wasIngredientWeightsChanged = false;
}
return _len;
}
Allowed me to go from 968 seconds to 745 seconds for the sample. Scaling it to all recipes gives:
(745 / 2199) * 182184 ~ 17 hours (starting from 34 hours)
Conslusion
“Can I make it faster” is not the only question to ask when optimizing. Sometimes asking “does it have to execute this often?” is also a very valid question.