While I’m working on the next angle on how to speed up calculating similarities I started investigating how to get better telemetry from cookit. Getting telemetry is easy - making sense of it is the hard part. This also brought another pain point of current setup - logging and monitoring. Since cookit is my pet, nonprofit project it was time to do something.

Continue reading...

In my ongoing series on calculating similarities one angle always seemed worth trying, and was pointed out many times on Reddit - use C++ and matrix manipulations. Similarity calculation fits very nicely into matrix representation, and there are algorithms targeting sparse matrix manipulation. So why did I delay it for so long? Because I had other angles I wanted to try and, from the looks of it required significant changes in the existing code base. But since last optimizations didn’t bring the time cuts I’ve expected, the time has come. Brace yourself.

Continue reading...

This post is an analysis of a very interesting optimization proposed by Nicholas Frechette in the comments under the previous post and t0rakka on Reddit. They proposed to use one of the oldest tricks in performance cookbook - divide and conquer. Well, it did not turn out as I expected.

Continue reading...

This post was inspired by a discussion on Reddit that followed my previous post

In this post, I will cover a suggestion by BelowAverageITGuy that cut down the total execution time by almost one hour.

Continue reading...

Last time I’ve shown how I’ve gone from 34 hours to 11. This time we go faster. To go faster I have to do less.

The current implementation of Similarity iterates over one vector and checks if that ingredient exists in the second one. Since those vectors are sparse the chance of a miss is big. This means that I am losing computational power on iterating and calling TryGetValue.

How to iterate only over the mutually owned ones and do it fast?

Continue reading...