Today there are a few choices to choose from when it comes to NoSQL databases. Bellow is a pros and cons list for Aerospike, CouchDB, Cassandra and OrientDB. The goal is to list each database that does best in its domain. The domains are read heavy, write heavy, balanced and graphing transactions.
- Designed for Speed, SSD-optimized storage
- Use Case
- Any application where low-latency data access, high concurrency support, read heavy, and high availability is a requirement
- Very fast read access of data with minimal latency
- Minimal latency when accessing data
- Automatic failover and automatic rebalancing of data when nodes or added or removed from cluster
- Cluster management with Web GUI
- Has complex data types (lists and maps) as well as simple (integer, string, blob)
- Aggregation query model
- Data can be set to expire with a time-to-live (TTL)
- Need to buy commercial license for XDR (Cross-datacenter replication)
- All keys need to be stored in memory regardless of whether it is an in memory or ssd namespace. Requires ram optimised instances.
- Designed for DB consistency, ease of use
- Use Case
- For accumulating, occasionally changing data, on which pre-defined queries are to be run.
- Master-master replication, allowing easy multi-site deployments.
- Server-side document validation possible
- Crash-only (reliable) design
- MVCC - write operations do not block reads
- Needs compacting from time to time
- Designed for Store huge datasets in “almost” SQL
- Use Case
- For write heavy applications that need high redundancy and fault tolerant Database
- Web analytics, transaction logging, and Data collection from huge sensor arrays.
- Very good and reliable cross-datacenter replication
- Writes can be much faster than reads (when reads are disk-bound)
- Data can have expiration (set on INSERT) aka tombstone
- CQL3 is very similar SQL, but with some limitations that come from the scalability (most notably: no JOINs, no aggregate functions.)
- Designed for Document-based graph database
- Use Case
- For graph-style, rich or complex, interconnected data.
- For searching routes in social relations, public transport links, road maps, or network topologies
- Advanced path-finding with multiple algorithms and Gremlin traversal language
- Sharding can be accomplished in combination with hazelcast
- SQL-like query language (Note: no JOIN, but there are pointers)
- Web-based GUI (self-contained)
- Multi-master architecture
- Has transactions, full ACID conformity
- Can be used both as a document and as a graph database (vertices with properties)
- Advanced monitoring, online backups are commercially licensed
I generally don’t like to post benchmarks unless I have researched the methods used or done them my self. But I figured it might be worth posting these findings. The below graph shows how Aerospike clearly dominates the read heavy application world. You can read more about this benchmark that was done by Aerospike here:
Deciding which NoSQL database to use is very application specific. For example, in my current situation, we require a fault tolerant database that can handle a read heavy application. The obvious candidates are CouchDB or Aerospike. However, with Aerospike at scale you will need to buy a license to take advantage of key features such as XDR (Cross-datacenter replication), and AMC (Aerospike Management Console). Although not well documented but it appears there is an artificial limit at the amount of nodes you can have in a cluster (127) with Aerospike. But looking at some benchmarks, you might not need that many nodes to handle hundred of millions of transactions per second (given a memory optimized instance).
With CouchDB on the other hand you don’t have these license restrictions. Its an Apache project and is fully open sourced but it lacks on some good features provided by Aerospike such as automatic failover, and high read performance with low latency.