# How to calculate 17 billion similarities

The previous post described the methodology I’ve used to calculate similarities between recipes in cookit. If you haven’t read it I’ll give it 4 minutes because it will make understanding this post easier. Go one, I’ll wait.

It ended on a happy note and everything seemed to be downhill from there on. It was until I tried to run it. It took long. Very long. How long? I don’t know because I’ve canceled it after about one hour. Going with a famous quote (probably from Einstein, but there are some ambiguities in this subject)

Doing the same thing over and over again and expecting different results

I’ve decided to, once again, use math to assess how long the calculation will take.

# How I calculate similarities in cookit?

Warning this post contains some math. Better still, it shows how to use it to solve real-life problems.

This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those not familiar with it, cookit is a search engine for recipes. It crawls websites extracting recipes, then parses them and tries to create a precise ingredient list replete with amounts and units.

By the time of writing it had:

• 182 184 recipes
• 2936 ingredients

This scale may not seem huge, but trust me - It’s enough to bring a slew of problems to light. And that cookit runs on a crappy server, partly by choice, can make things all the more complicated.

# Problems with AsParallel

This post is covering a subset of what I am talking in my talk How I stopped worrying and learned to love parallel processing (currently only in polish).

This will cover on how, in terms of performance, AsParallel can kick you in a place where it hurts a lot, simultaneously being a blessing in terms of… performance. How is that? Let’s look at some

## History

AsParallel was introduced as an extension to LINQ with TPL in .NET 4.0. In theory, it’s God’s sent. The promise was that it will:

• parallelize the LINQ query.
• take care of all thread management and synchronization.
• not require any additional code changes except for .AsParallel()

And in the vast majority of cases, this promise was kept! For example look at this code:

# Debugging high memory usage. Part 2 - .NET Memory Profiler

Diagnosing high memory usage can be tricky, here is the second part of how I found what was hogging to much memory in our system. In the previous post I’ve wrote how to create a memory dump and how many possibilities of catching just the right moment for it ProcDump has. When trying to analyze memory leaks, or high memory usage (not necessary meaning a leak) we have a few ways to approach it:

# Debugging high memory usage. Part 1 - ProcDump

I’m taking a short break from Hangfire series, but I will get back to it.

This time - Where did my memory go ? Or to be more exact: Why is this using so much memory?

The story starts with one IIS application pool using around 6 Gigabytes of memory on one of our test environments. It was several times above the values that we expected it to use, so we decided to investigate.

Without much thinking we fired up Visual Studio installed on the test server, and attached to the process. Since the application was build in Debug mode we had all the pdb files in the website folder.

Do I have your attention now? The above paragraph is of curse a joke and a bunch of anti patterns. Don’t do any of them!