This post was inspired by a discussion on Reddit that followed my previous post

In this post, I will cover a suggestion by BelowAverageITGuy that cut down the total execution time by almost one hour.

Continue reading...

Last time I’ve shown how I’ve gone from 34 hours to 11. This time we go faster. To go faster I have to do less.

The current implementation of Similarity iterates over one vector and checks if that ingredient exists in the second one. Since those vectors are sparse the chance of a miss is big. This means that I am losing computational power on iterating and calling TryGetValue.

How to iterate only over the mutually owned ones and do it fast?

Continue reading...

This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:

The fastest code is the one that doesn’t execute. Second to that is the one that executes once

Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.

Continue reading...

The previous post described the methodology I’ve used to calculate similarities between recipes in cookit. If you haven’t read it I’ll give it 4 minutes because it will make understanding this post easier. Go one, I’ll wait.

It ended on a happy note and everything seemed to be downhill from there on. It was until I tried to run it. It took long. Very long. How long? I don’t know because I’ve canceled it after about one hour. Going with a famous quote (probably from Einstein, but there are some ambiguities in this subject)

Doing the same thing over and over again and expecting different results

I’ve decided to, once again, use math to assess how long the calculation will take.

Continue reading...

Warning this post contains some math. Better still, it shows how to use it to solve real-life problems.

This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those not familiar with it, cookit is a search engine for recipes. It crawls websites extracting recipes, then parses them and tries to create a precise ingredient list replete with amounts and units.

By the time of writing it had:

  • 182 184 recipes
  • 2936 ingredients

This scale may not seem huge, but trust me - It’s enough to bring a slew of problems to light. And that cookit runs on a crappy server, partly by choice, can make things all the more complicated.

Continue reading...