This is a rectification for the previous post about the cost of garbage collection. If You didn’t read it give it a try and check if You can spot the bug/mistake.

Like Konrad pointed out in his comment not all objects were in generation 0 as I assumed. This is partly connected to the fact that .NET, seeing rapid need for memory in Setup, will try to collect some of them, calling garbage collection, but I make the matters worse by calling GC.Collect in the end. So making sure that objects created during Setup will be in generation 2 (in my defence it was left over after a bit different take on this problem).

This is the proper code (it also was committed to the github repo):

Continue reading...

Update There is a error in the first version of this code, and the tests are in fact always operating in a situation that almost all object are in generation 2. The answer is in the next post

In the previous post I’ve promised to write how to have differentiate the number of workers in Hangfie, but in the comments Michał asked one interesting question - “Is generation 1 collection more expensive then generation 0 collection?”. Going further how does generation 2 collection fit into it?

Continue reading...

One of the steps in cookit is calculating similar recipes. This is what you can see on the left on the recipe page like this

For the sake of clarity and manageability it’s scheduled as separate Hangfire jobs. Because cookit is running 5 workers, so similarities are calculated for 5 websites concurrently.
The process uses cosine similarity, so it allocates a huge list at start and calculates similarities. A very CPU heavy operation.

Continue reading...

This post is written more as a warning

Anyone who made any HackerRank problems considering performance has seen this phrase in the assignment: “watch out for slow IO”.
We are used to think about files, databases and such as potentially slow IO, but the Console?
Yes, and you will be amazed how much.

Continue reading...

One of the main processes in cookit is dealing with extracting recipe information from raw html. I know it isn’t the most elegant solution but it is the only universal one.

But to the point.

Every web page goes through a process involving html parsing, stemming, parsing, and n-gram token matching. Then it’s saved to Sql Server and after transformation to Solr. So a lot of string manipulation, math calculations and from time to time mostly 0-gen GC.

Continue reading...