Last time I’ve shown how I’ve gone from 34 hours to 11. This time we go faster. To go faster I have to do less.

The current implementation of Similarity iterates over one vector and checks if that ingredient exists in the second one. Since those vectors are sparse the chance of a miss is big. This means that I am losing computational power on iterating and calling TryGetValue.

How to iterate only over the mutually owned ones and do it fast?

Continue reading...

This will be a fast errata to the previous one. This time I will expand the oldest performance mantra:

The fastest code is the one that doesn’t execute. Second to that is the one that executes once

Last time I’ve forgot to mention one very important optimization. It was one of two steps that allowed me to go from 1530 to 484 seconds in the sample run.

Continue reading...

The previous post described the methodology I’ve used to calculate similarities between recipes in cookit. If you haven’t read it I’ll give it 4 minutes because it will make understanding this post easier. Go one, I’ll wait.

It ended on a happy note and everything seemed to be downhill from there on. It was until I tried to run it. It took long. Very long. How long? I don’t know because I’ve canceled it after about one hour. Going with a famous quote (probably from Einstein, but there are some ambiguities in this subject)

Doing the same thing over and over again and expecting different results

I’ve decided to, once again, use math to assess how long the calculation will take.

Continue reading...

Warning this post contains some math. Better still, it shows how to use it to solve real-life problems.

This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those not familiar with it, cookit is a search engine for recipes. It crawls websites extracting recipes, then parses them and tries to create a precise ingredient list replete with amounts and units.

By the time of writing it had:

  • 182 184 recipes
  • 2936 ingredients

This scale may not seem huge, but trust me - It’s enough to bring a slew of problems to light. And that cookit runs on a crappy server, partly by choice, can make things all the more complicated.

Continue reading...

This post is covering a subset of what I am talking in my talk How I stopped worrying and learned to love parallel processing (currently only in polish).

This will cover on how, in terms of performance, AsParallel can kick you in a place where it hurts a lot, simultaneously being a blessing in terms of… performance. How is that? Let’s look at some

History

AsParallel was introduced as an extension to LINQ with TPL in .NET 4.0. In theory, it’s God’s sent. The promise was that it will:

  • parallelize the LINQ query.
  • take care of all thread management and synchronization.
  • adjust the number of Tasks automatically.
  • not require any additional code changes except for .AsParallel()

And in the vast majority of cases, this promise was kept! For example look at this code:

Continue reading...