Common database paradigms

From the simplest to the more advanced, here is a list of different database paradigms with an introduction and some use cases for each one.

Use this index if you want to navigate directly to a specific one.

1. Key-Value

Redis, Memchached

Data is saved as a PHP associative array or a Python dictionary. Every value saved has a unique key used to access (or write) to the database. Data is stored in RAM, meaning that you can read and write in fractions of a millisecond.
This type is commonly used to cache data, manage sessions, and queues. The structure does not allow common operations such as joins but allows queries to be extremely fast.

Redis, as an example, has two commands to read and write data: SET and GET.

Concept example of Memcached

function get_foo(foo_id)
    foo = memcached_get("foo:" . foo_id)
    return foo if defined foo

    foo = fetch_foo_from_database(foo_id)
    memcached_set("foo:" . foo_id, foo)
    return foo
end

2. Wide Column

Cassandra, HBASE

IIn wide column databases, the idea of a key-value store is extended as if we added a new dimension, in which each row has its own set of columns with its data. This allows correlated data to be stored together, but unlike relational databases, it does not have a defined schema, meaning that the database structure can be defined afterward.

Netflix uses this paradigm to store the history of viewed content by its users. [1][2]

These databases use a language called CQL, which is very similar to SQL, but joins are not possible.

Some of the main advantages include the ease of replication of data between nodes and the horizontal scalability.

A common use is for storing time series or history, or data, in general, that is often written and less often read.

Example of CQL vs SQL (source)

SELECT Name, Phone.Number FROM Contact WHERE Address.State = 'NY'

SELECT contact.name, phone.number FROM contact
JOIN address ON (address.contact_id=contact.id)
LEFT JOIN phone ON (phone.contact_id=contact.id)
WHERE address.state = 'NY'

3. Document

MongoDB, Firestore, DynamoDB, CouchDB

In document-oriented databases, data is represented as a document, stored through the key-value paradigm. It does not need a schema.

Documents can be grouped by collections, where each field in a collection can be indexed and collections can be organized according to a hierarchy. It has a structure similar to relational databases but does not support joins.

This database tries to store data in documents, rather than normalizing it in pre-structured columns in a table. This translates into an ease of querying data by front-end applications, at the expense of complexity during write.

From a developer’s perspective, this database is very easy to use because it does not require accurate knowledge of the data he/she is going to work with, allowing the creation of the schema at the time of storing.

This database is not very suitable for data that has lots of relationships.

Q: why can't SQL and NoSQL Developers date one other?

A: because they don't agree on relationships.
— Amelia Warner (@facetimeJS) November 21, 2020

Examples from MongoDB (source)

Inserting different documents in the inventory collection using the Mongo Shell.

db.inventory.insertMany([
   { item: "journal", qty: 25, size: { h: 14, w: 21, uom: "cm" }, status: "A" },
   { item: "notebook", qty: 50, size: { h: 8.5, w: 11, uom: "in" }, status: "A" },
   { item: "paper", qty: 100, size: { h: 8.5, w: 11, uom: "in" }, status: "D" },
   { item: "planner", qty: 75, size: { h: 22.85, w: 30, uom: "cm" }, status: "D" },
   { item: "postcard", qty: 45, size: { h: 10, w: 15.25, uom: "cm" }, status: "A" }
]);

Querying data with conditions.

db.inventory.find( { status: "D" } )

Which is the SQL equivalent of:

SELECT * FROM inventory WHERE status = "D"

4. Relational

MySQL, PostgreSQL, SQL Server

These were originally created by an IBM scientist named Ted Codd [3], who had dedicated his studies to relational models. These studies led to the creation of a query language called SQL (i.e. structured query language), to interface with relational databases.

To describe relational databases, we can make use of an example.

Consider a factory that builds cars, and each car has a different supervisor. We store information about cars and supervisors in tables respectively called car and supervisor. Now, each entry in each table (i.e. a row) has a property that uniquely identifies that entry. This property is called primary key. Now, if we want to know which car has which supervisor, we just have to save the supervisor row primary key to the car row in the car table. A primary key that is saved in another table is called foreign key, and allows to query the database by relations.

SELECT car.name, supervisor.name
FROM car
INNER JOIN supervisor ON supervisor.id = car.supervisor_id;

Queries that use relations, use a concept called JOIN, in which tables are joined together to visualize relationships. [4]

This type of database needs to have a schema defined before storing data, and a schema not only defines the internal structure of data but also enforces the store of data in its most primitive form.

This means that we have to know in advance if we are going to store a piece of text, an integer or a floating-point number, etc… [5]

SQL databases conform to a standard called ACID, which describes how actions (called transactions) should be performed to achieve data validity and integrity, even if a network error or a crash occurs.

This is the most popular database type. [6]

5. GRAPH

Neo4j

In this type of database, data is represented by graphs, where the data itself is a node, and relations are represented by vertices.

In many-to-many relations, relational databases need to have a join table to store the foreign keys of the related tables, but in graph databases, this table is not present because relations are defined directly by the connection of nodes.

When dealing with data that is highly correlated, this type of database is very handy, allowing getting data easier than having to write multiple join queries.

These databases are used mostly for suggestions, or to create knowledge bases.

6. SEARCH

Elastic, Solr (based on Apache Lucene technology) or Algolia

From a developer standpoint, these are very similar to document-based databases. Data is added in the form of text, that is analyzed to create an index, like the one at the end of books. A user performing a search will get the results fast because the lookup will use the index.

This kind of search can be used to generate suggestions and to correct typos.

Practical examples can be:

suggest products when a user searches something in an online shop
return ids of rows of a relational database faster than searching through conditions
understand that “alice in wornderlasd” means “Alice in Wonderland”

Thanks for reading, consider following me on Twitter 🦃

22/11/2020

In Databases, Development

#algolia #cassandra #database paradigms #elasticsearch #memcached #mongodb #mysql #neo4j #postgresql #redis #solr