A recent article on ReadWriteWeb by Tony Bain has gotten a lot of attention for asking, “Is the Relational Database Doomed?” It describes the advantages of key/value databases, including cloud-based ones like Amazon’s SimpleDB, compared to traditional relational database management systems (RDBMS’s) like Oracle or MySQL. Unfortunately, Bain grossly exaggerates the benefits of a key/value database.
When discussing the suitability of different types of database platforms, the user type matters. The user could be:
A) a business person who is not a programmer
B) a programmer building a custom application
C) a programmer building a custom, multi-tenant, very large SaaS application
TrackVia was designed squarely for user type A, and to a lesser extent B. Bain mostly discusses user type C, which is a miniscule subset of database users. Nevertheless, I’ll save my comments on A and B for a future post, and address C head on. (Non-nerds, feel free to bow out now.)
To start with, any hosted database is inappropriate for building a multi-tenant SaaS application. To be clear, hosted SaaS apps are great; hosted SaaS apps built on hosted databases are not. SaaS apps require data to flow from the end user’s browser to the SaaS server and back. If the SaaS app uses a third-party hosted database, data has to flow from the user’s browser to the SaaS server, then to the third party database server, back to the SaaS server, and back to the user. That extra round trip across the public Internet adds a significant overhead to every user request. AJAX-style user interfaces often require multiple HTTP requests per screen. It’s hard enough to make a SaaS product fast and snappy for end users (trust me on this). With a non-local database, I suspect it’s impossible. (Again, I’m talking about building a multi-tenant SaaS app on a remote database, not building an ordinary business app on a SaaS platform like TrackVia – which works great, by the way.)
Second, as Bain makes clear, the current hosted key/value databases have limitations. They’re show-stoppers for SaaS companies. One of them limits programmers to fetching 1,000 records at a time. That makes it impossible, or performance-prohibitive, to allow a customer to search 50,000 sales orders for ones matching non-trivial criteria. Another hosted key/value database limits the total data store to 1 terabyte. If a multi-tenant SaaS application is commercially successful, it will blow through that in a hurry.
Third, even local key/value databases are unable to process large volumes of data quickly. Take the example of searching those 50,000 sales orders. With a key/value database, all 50,000 records have to be pulled from the database server into the application server and then evaluated individually. Web scripting languages like PHP or Ruby don’t do these things particularly quickly; RDBMS’s do. There are several reasons for this: RDBMS’s use indexes, they’re good at caching large amounts of recently-used data in memory, they’re written in C and have highly optimized analytical functions. RDBMS’s are also much faster at grouping, aggregating, and other data manipulations than app code is.
So, local or hosted, key/value databases have important limitations. What about the limitations of the RDBMS? Bain discusses two, and I think they’re over-stated too.
First, Bain describes the extra “plumbing” required for object-to-relational mapping when using an RDBMS. But modern web languages like Ruby on Rails and PHP do this automatically (see this list). In older object-oriented languages like Java or Perl, a single data access base class can be sub-classed to support a new database table with a few lines of code. Even if all object-to-relational code is redone by hand each time a table is added to the schema (anyone writing web apps in C these days?), I can’t imagine SaaS companies are spending more than a few programmer-hours per month doing this.
Second, Bain describes the lack of scalability of RDBMS systems. It’s true that beyond a dozen or so servers, a single logical database (i.e. a multi-server cluster) begins to fall down. Bain is right that this is the key driver forcing SaaS architects to rethink the database. Today, most big SaaS companies solve this by partitioning customer data: the first thousand customers go into database instance 1, the next thousand into instance 2, and so on. No cluster is required, and it’s essentially arbitrarily scalable. Data partitioning is pretty easy if anticipated in the architecture early on. Even retro-fitting a SaaS code base to support partitions is a matter of weeks of coding, not months (trust me on this too).
So, in summary – if you’re a programmer building what you expect will be a very large, multi-tenant SaaS application, do yourself a favor and stick with a tried-and-true RDBMS, run locally. If you’re a business user or an IT professional looking for a better database for a business application, congratulations on reading this far. I recommend a good online database.