NoSQL Distilled

NoSQL Distilled

The term “NoSQL” is very ill-defined. It’s generally applied to a number of recent nonrelational databases such as Cassandra, Mongo, Neo4J, and Riak. They embrace schemaless data, run on clusters, and have the ability to trade off traditional consistency for other useful properties. Advocates of NoSQL databases claim that they can build systems that are more performant, scale much better, and are easier to program with. — NoSQL Distilled, by Martin Fowler

RethinkDB looks promising

RethinkDB is out – an open-source distributed database

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

An open-source distributed database with an intuitive query language, parallelized queries, and much more! It’s called RethinkDB and you can fork it on GitHub. What’s not to like?

JavaScript filesystem database

I couldn’t resist and, after many conversations with Mário Valente, I challenged myself to build a complete JavaScript filesystem based database from scratch. In short, something like CouchDB but written in JavaScript and using a filesystem approach for document storage.

Some characteristics of this new database system:

  • fully written in JavaScript: nodeJS is my choice for its architecture and portability;
  • every document must contain only JSON data;
  • every document can be manipulated on the filesystem: the idea is to allow someone to edit a document using vim, for instance, without breaking anything. It also lets you easily backup, move or replicate your data without creating a heavy load on the database;
  • the underlying filesystem can be changed: changing the underlying filesystem offers numerous possibilities, like easy distribution (using OpenAFSXTREEMFS or others), replication (using finefs or others), and even attaching different backends (using FUSE, for example);
  • the system must be available through an HTTP REST interface: this makes it very easy to immediately integrate the system into any existing applications. Bonus points if the API is compatible with CouchDB;
  • queries must be performed using a MapReduce approach: this should make it very easy to perform queries on a large dataset.

So, right now I’m half way through it. I managed to write the backend and I’m on my way to the HTTP REST interface. I’ll create a github repo after I have something functional that can be easily built and tested.

What do you think of a system like this? Any features you’d like to add or change?