So Which Database Should I Use?

At Inverted Software we work closely with our customers to design systems that fit their needs.

A part of every system is its data store, also referred to as Data Mart or Data Warehouse.

There are five main database categories:

SQL Databases

SQL (Structured Query Language) databases are designed for managing data held in a relational database management system (RDBMS).

RDBMS systems use a structure of tables and relationships in order to store data in a way that guaranties accuracy.

SQL Databases are often used as the Single source of truth (SSOT), where every data element is stored exactly once.

Due to their ACID (Atomicity, Consistency, Isolation, Durability) features, they are most commonly used as the backbone of systems that require data integrity.

Types of SQL Databases include: PostgreSQL, SQLite, MySQL, MSSQL, Sybase and more.

NoSQL Databases

If you work with applications that create massive volumes of new, rapidly changing data types — structured, semi-structured, unstructured and polymorphic data, you might consider a NoSQL Database.

NoSQL databases provide a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.

NoSQL data can be modeled as objects, documents key value pairs and more.

NoSQL Databases distribute data across clusters of servers, making big data storage possible.

In fact, you can hold billions of records on a cluster made from small cheap machines.

Such databases include: Cassandra, HBase, MongoDB and more.

NoSQL Databases are fast!

Since NoSQL databases do not include transactions, referential integrity or even guarantied inserts, they can avoid the bottleneck of SQL Databases making them more than ten times faster when inserting data.

Inserts are usually written into memory and from there to disk on one of the cluster’s machines and although retrieving data still requires the use of indexes in some cases, they can still be over three times faster than a traditional SQL Databases.

NewSQL Databases

NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.

NewSQL systems are best used in cases where a system has a large number of transactions that are short-lived and touch a small subset of data using index lookups.

NewSQL Database include: Clustrix, VoltDB, MemSQL and more.

Big Data distributed storage and processing

Databases of all kinds excel at storing and retrieving data, however, as your unstructured data grows, processing and reporting in requires a distributed processing engine.

Distributed data processing engines use a MapReduce algorithm to process your data in small parallel chunks.

The leading Big Data processing engine is Apache Hadoop.

The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing engine that uses the MapReduce algorithm.

Hadoop splits files into large blocks and distributes them across nodes in a cluster.

To process data, Hadoop transfers packaged code to its nodes to process in parallel based on the data to be processed.

In Memory Databases

In Memory Databases hold data in the cache. They are used in order to load the data your application most commonly uses from its permanent storage database into memory, as a technique of performance optimization.

They can also be used if your application generates temporary data that will not be stored.
For example: Session data.

In Memory Databases can utilize SQL and be ACID-compliant, or use NoSQL and store objects or key value pairs.

Some In Memory have long term disk based storage mechanisms as well as in memory store and can persist your data.

Notable In memory database include: Redis, VoltDB,Memcached.

So Which Database Should I Use?

Choosing the right technology for the right system can be a difficult task.

An enterprise system can utilize multiple databases.

A SQL Database for long term storage of the core data, a NewSQL Database for caching and a Hadoop engine for processing.

Need an evaluation of your system?

Contact us and let us know how we can help.

Inverted Software

Phone: 818.262.8552
Email: contact@invertedsoftware.com

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s