In my ongoing series on calculating similarities one angle always seemed worth trying, and was pointed out many times on Reddit - use C++ and matrix manipulations. Similarity calculation fits very nicely into matrix representation, and there are algorithms targeting sparse matrix manipulation. So why did I delay it for so long? Because I had other angles I wanted to try and, from the looks of it required significant changes in the existing code base. But since last optimizations didn’t bring the time cuts I’ve expected, the time has come. Brace yourself.

Continue reading...# Dividing a bit in two for performance

This post is an analysis of a very interesting optimization proposed by Nicholas Frechette in the comments under the previous post and t0rakka on Reddit. They proposed to use one of the oldest tricks in performance cookbook - divide and conquer. Well, it did not turn out as I expected.

Continue reading...# Making bits faster

This post was inspired by a discussion on Reddit that followed my previous post

In this post, I will cover a suggestion by BelowAverageITGuy that cut down the total execution time by almost one hour.

Continue reading...# Using bit masks for high-performance calculations

Last time I’ve shown how I’ve gone from 34 hours to 11. This time we go faster. To go faster I have to do less.

The current implementation of `Similarity`

iterates over one vector and checks if that ingredient exists in the second one. Since those vectors are sparse the chance of a miss is big. This means that I am losing computational power on iterating and calling `TryGetValue`

.

How to iterate only over the mutually owned ones and do it fast?

Continue reading...# Independent code in performance optimizations

This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:

The fastest code is the one that doesn’t execute. Second to that is the one that executes once

Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.

Continue reading...