Different DB Platforms for Different Jobs

Dean Compher

26 February 2016

 

 

In my work as a sales engineer I run into IT people with all sorts of views about developing applications and the types of service their data stores need to provide.  On the one hand there are the traditional enterprise systems developers where absolute consistency of data and highly polished applications are paramount.  On the other hand there are developers of applications where speed of development is of utmost importance and a few rough edges don’t hurt anything.  Frequently, people get so immersed in their side of the game that they just can’t fathom anyone doing it differently.  The key, though is to remember that there are all sorts of application requirements and it is important to use the right tools to meet the most important goals of the project.  In the database world all sorts of technologies are available to meet these varying needs and in this article I will discuss which ones I work with that can help meet the needs of the projects you have. 

 

There are a number of applications where errors caused by introducing changes are so devastating that a rigorous development and testing process are still needed.  Think of financial applications that control withdrawal, deposit and transfer of money are involved.  These are worth investing a great deal of effort into getting the application written and deployed.  Delays in getting changes into production pale in comparison to the costs of bugs in the production code or of inconsistent data.  Further, these applications require absolute database consistency (ACID compliance), backup and logging with point in time restores, and well defined schemas.  The fact that these facets of the database may slow development is of secondary importance.  Since the application only changes occasionally, having new database features delivered frequently is of little value.  Plus outages needed for databases fixes and upgrades impact service level agreements and can introduce unnecessary risk. 

 

However, there are many types of decentralized applications where getting new functions to market quickly is more important than having perfect software or data consistency.  Think about smart phone applications or internet sites where users are interested in going to the apps or sites with the coolest technology or features and are not all that concerned about the rough edges.  For these types of applications, users will leave you even if your stuff works flawlessly, but isn’t the latest and greatest.  For example, I like the app on my phone I used to scan business cards and translate them into my contacts list.  I don’t care if it just dies once in a while and have to touch it again.  However, if someone else comes out with a cool new feature, I’ll abandon the first app in a heartbeat.  Also if the store where this app keeps images of cards has a problem with a few of the ones I scan it is no big deal.  However, since it is undergoing frequent updates, having new features available in the database can be quite useful since that might allow the developer to quickly add additional features to the app.

 

I’ll now discuss some databases available for these types of needs.

 

JSON and NoSQL Stores

JSON (Java Script Object Notation) is a record format that generally contains elements of the record in a hierarchy and defines each element textually.  Since the “schema” is defined individually in each record (or document) then it is said to be schema-less.  However, records used by a set of applications will tend to have a similar format.  I like this description of JSON.  In many relational database applications, a lot of development work is done translating data between the internal record format and the format (schema) used to store the data in a database.  JSON databases allow the record to be written and read by the internal format needed by the application.  Further, since each field is tagged, you just add or stop using fields as desired when maintaining an existing applications.  Other applications using that JSON records that don’t reference that field don’t care about the change.  This is a lot different in the relation world where a change to the schema frequently causes lots of rework to existing applications and/or a significant outage to the database. 

 

The point is that JSON makes applications that exchange data or store data easy to write, so it is used quite extensively.  There are a number of JSON databases that make it easy to store and query JSON records.  The IBM Cloudant database is an excellent example of a JSON database.  Most implementations are in the cloud and fully managed by IBM.  This means that the developers of applications can just begin using the database and don’t have to worry about configuring and managing a database.  This is often a very important to new applications that you need to get going quickly.  Cloudant is a clustered database technology that can start small and can grow very quickly as needed.  This is also important in many new applications as you need to contain costs initially before you know if it will be used much, but where you may need to expand the capacity of the database very quickly if it gains wide acceptance.  However, like most pure JSON stores, it does not have transaction control and would allow things like two different users to update the same record at about the same time.  This does not really matter for my card scanning application and many others.  Again for these types of applications, ease of use and fast development times are the most important things. 

 

You should note that if you want to store and query JSON documents, but need ACID compliance or other features of traditional databases you still have tools available.  IBM DB2 and Informix allow you to store JSON documents and query them through the MongoDB API.  These give you all the transactional control, backup, recovery and other features important to many enterprise applications.  And speaking of NoSQL, they also have great XLM features as well.  The JSON and XML features of both databases are available in the on-premises software and DB2 has the features available in the cloud as well. 

 

Fully Managed Relational Cloud Databases

There is a relatively new class of databases in the cloud that are managed by the cloud vendor.  The hardware, database software, memory and CPU configurations, patches and version upgrades are handled by the vendor, with the vendor providing credentials to the user of the database.  For its fully managed databases, IBM monitors the availability of the database and fixes problems if they occur.  Further, most fully managed cloud databases are being developed in an agile methodology, which means that fixes and features are being added to the database frequently.  The user typically cannot log into the OS using SSH or other tools, but can only configure the database through database command line connections or GUI tools like IBM Data Studio or Toad. 

 

One example of an IBM fully managed database is dashDB.  It is based on DB2/BLU technology, but has many analytical features of IBM’s Netezza (PDA) technology.  It is a fully managed data warehouse and fixes and new features are delivered about every 4 weeks.  This is not a great fit for many relative static traditional purchased or complex in-house developed applications.  Since they rarely change, they can’t take advantage of the new features, but every change to the database comes with risk. 

 

However, this is great for other classes of applications.  For example, if you want to quickly get a lot of data into a data warehouse and query it to see if there is even any value in the data, then using dashDB is a great alternative.  This is because you can deploy the database quickly and without purchasing hardware, installing and configuring software, or configuring lots of complex objects like indexes and materialized views.  You just deploy the server(s), load your data and begin querying it.  If it does not provide any value, you can stop using it and stop paying.  In this case “failing fast” before you squander a lot of resources is the way to go.  Further, many data warehouses do not have complex applications written for them.  Instead they use a BI tool like Cognos to create queries against the database.  Since these tools tend to use the database in a more generic way than complex applications, they are less prone to peculiarities of a particular database.  Also since users tend to create new queries all the time, having new features show up frequently is quite useful.

 

In addition to dashDB on the IBM Bluemix cloud there are other fully managed relational databases including SQL Database and Time Series Database.  Both of these are for transactional workloads.  There is also a fully managed Hadoop systems on the Bluemix cloud. 

As of the writing of this article, all fully managed databases are backed up for you once a day and can be restored if you have a problem.  However, restores are only to the point in time that the backup was taken.  This could change in the future. 

 

IBM also has offerings for fully managed open source databases.  They are offered through IBM Compose and Bluemix, and include MongoDB, Redis, PostgreSQL, RethinkDB, etcd, RabbitMQ and Elasticsearch.  These services are great for organizations who use these databases as their primary data stores and want a very scalable environment managed professionally by someone else.  It is also great for the enterprise who primarily uses more traditional databases, but need to support the occasional open source database and don’t have administration skills in-house for them. 

 

Self-managed Relational Cloud Database

DB2 on Cloud is a platform as a service database.  If you want the ease of deployment of a cloud database but need to manage the implementation of fixes and features yourself and want to control the frequency and types of backups, then this is a better service for you.  When you deploy a DB2 on Cloud system, you get a server (virtual or bare metal) with the most recent version of DB2/LUW installed with a recent fix pack.  You also get an administrator id with SUDO root access and the ability to SSH into the server and start doing what you like with the server and database.  You can create as many instances and databases as you like and configure them anyway like, making them transactional or warehouse as you see fit.  You apply maintenance and upgrades to the OS and database as benefits your application and you do all monitoring and troubleshooting. 

 

This platform as a service DB2 on Cloud system is great for more traditional types of applications that need to have fixes and upgrades applied when the application needs it rather than on a vendor schedule.  It also allows for any sort of backup and recovery process you desire.  However, it has the advantages of a cloud system where you can get a fast deployment and you do not need to buy and install hardware. 

 

On-Premises databases

Cloudant, DB2 and Informix can all be deployed in your own data center on your own hardware.  They can be in a traditional deployment or as an on-premises private cloud.  Since it is on your premises, you have full control of security, configurations and backup types.  Even if you prefer a public cloud solution, one reason for running an on-premises database is when you need to exchange extremely large amounts between your new systems and legacy systems on a frequent basis.  While improvements are being made for moving large amounts of data across the internet, those speeds still can’t match intra-datacenter transfers. 

 

***

 

So as you encounter database deployment models that don’t fit what you are doing at the moment, just remember that there are a variety of database deployments because there are a number of different priorities when it comes to developing applications that use those databases.  Please post any additional thoughts you have on this topic to my Facebook Page or db2Dean and Friends Community

 

HOME | Search