Warning this post contains some math. Better still, it shows how to use it to solve real-life problems.
This post describes how I calculate similarity between recipes in my pet project cookit.pl. For those not familiar with it, cookit is a search engine for recipes. It crawls websites extracting recipes, then parses them and tries to create a precise ingredient list replete with amounts and units.
By the time of writing it had:
- 182 184 recipes
- 2936 ingredients
This scale may not seem huge, but trust me - It’s enough to bring a slew of problems to light. And that cookit runs on a crappy server, partly by choice, can make things all the more complicated.
Continue reading...