This article is Part 3 in a 3-Part Series.

Part 1 - What is the simplest database?
Part 2 - Want unlimited scale and performance? This is where to start
Part 3 - This Article

Previously I’ve wrote about key-value databases. They are awesome - ultra-fast, simple, can scale almost linear with the number of nodes. So why bother with complicating them?

Well, they have some issues.

Problems with key-value databases

The central concept of a key-value database is that the database doesn’t care what the is value. It may have some assumptions, like Redis, but the structure of the data is not of its interest. This leads to some limitations that can be problematic in some scenarios.

1. Can’t filter on value fields

Quite evident since from the database point of view value is a blob.

2. The whole value is returned

This may not seem as a problem, but remember, key-value databases are chosen for speed. When looking at a flow of retrieving data from a database:

Most of the steps performance is dependent on the size of the transmitted data, not the data used (this is why SELECT * is a sign of someone not giving a f**k about performance).

3. The value can be updated only as a whole

It is a problem because we have to:

get the complete data to the client (see point above)
operate on the entire data
send the whole data back to the database

Sounds not that bad, even good. We want to have the whole object when updating it, right?

Think about those cases:

update users last login date
append the element to the list (like online checkout basket)
change the prices on certain products because of a promotion

So how to solve those problems and not loose all that speed?

Wide column databases

The idea behind it is simple:

Let's structure data (that is the value part) again into key-value pairs.

This is hat we have in key-value databases:

This is how wide column databases represent data:

Having columns allows defining a subset of data we want to return to the client or subset of data that should be updated.

How is it different from a regular table in a relational database?

In most wide column databases, columns are defined on the single item level meaning there is no database-wide schema, which leads us to some interesting features of wide column databases:

Benefits from wide column databases

While wide column databases keep most of the perks of key-value databases mentioned previously they have some additional ones (They don’t have to exist in every implementation of a wide-column database, but most of them have it):

partial operations (add column value, update column value).
data compression. When dealing with sparse data we don’t have to store empty/null values. This way we can save space normally reserved because schema defined them.
WHERE clause. This feature is not that common in this type of databases, but filtering on data starts to appear.

In the next post a sample domain modeling using a wide-column database.

Szymon Warda

Hi, I'm Szymon Warda. I write code, design IT systems, write this blog, tweet and speak at conferences. If You want to know more go here, or follow me:

IndexOutOfRange

What is the problem with key-value databases and how wide column stores solve it.

April 10, 2017