Db2EventStore

Db2 Event Store

Dean Compher

28 June 2018

If you need to ingest large volumes of data into a database quickly while running well-performing analytic queries across all rows including ones that have just been inserted, then the Db2 Event Store is for you. Db2 Event Store can insert up to 1 million rows per second per node. It is shipped in a Docker image and is easy to install and use. Further, since the data is stored in open source Apache Parquet files, you are not locked into our database. You can run a different application on these files to store and retrieve your data at any time you like and get rid of DB2. Also, if you need a traditional SQL interface to this data from your regular BI tools like Cognos, then you can add our BigSQL feature that uses BLU technology to query the data.

At the core of the DB2 Event Store processing is Apache Spark which is a framework that allows fast processing across a cluster of servers. On top of the Spark engine runs the Event Store processes that perform the inserts and queries at top speed. You interface with these processes through the APIs provided by Event Store. These APIs provide fast ingest and extend SparkSQL to interface directly with the Event Store processes.

Unlike traditional relational databases, the primary focus of this database is to insert very large amounts of data as a stream and allow queries that can access rows the instant they are received by the database. Therefore, it is primarily designed to be called through non-SQL APIs from external programs. Scala, Python and Java APIs are provided in all editions, and REST APIs are provided for the Enterprise and Standard Editions. There is a Db2 Event Store OLTP context or Db2 Event Store SQL Context that needs to be put on external servers to use these APIs. Further, data loading and analytics are available through Data Science eXperience (DSX) that makes it easy for you to use Jupyter notebooks and several examples are provided. Currently only insert and select operations are allowed, with delete and update on the horizon.

The APIs provided by Db2 Event Store allow applications to query data as soon as it arrives at the database, but only using SparkSQL. However, IBM’s BigSQL component can optionally be added that allows traditional Db2 Clients like Cognos to query the database with standard SQL using the IBM Data Server clients and drivers as is normally done. With the BigSQL interface you get excellent query performance because it has components of BLU Acceleration and years of query optimization experience built in, but the newest data available to this interface is likely to be several seconds old since it only accesses data written to the Parquet files.

Db2 Event Store is a great data store for streaming applications like IBM InfoSphere Streams and open source Apache Kafka because it is designed to keep up with the most demanding input streams of data, while being easy to install and use. Getting data into this data store may be the main focus of the application or you may just need a convenient place to keep the data for later processing. In either case DB2 Event Store provides a relational database that can keep up with a huge amount of incoming data while allowing you to query it immediately. Examples of the types of applications that may need this are utility smart meter systems, large website click stream capture, and hospital systems that capture all data from manuy medical devices. There is an InfoSphere Streams operator in development that you can download from Github and use it.

There are currently two editions of Db2 Event Store – Developer Edition and Enterprise Edition. Developer Edition runs on one node and can be put on your workstation. It is available on Mac, Windows or Linux. Enterprise edition is free for non-production environments and runs on a 3-node Linux cluster. It has more features available than the Developer Edition. The editions summary page shows a nice list of features for each edition.

Event Store Processing

One of the reasons the Db2 Event Store is built on Spark is for its in-memory processing. As data arrives into a Db2 Event Store engine process it is cached in a log that is backed up by SSD storage. Each node in the cluster will be processing inserts. When an insert happens, the node processing that insert replicates the insert into the other nodes. This provides high availability and prevents data loss. At some interval the background Groomer/Roller process wakes up and reformats the log data into compressed Parquet format and writes it to the permanent storage. By default this happens every second, but the interval can be changed. At another interval re-grooming is also performed that optimizes the Parquet files to combine small files into larger ones, improve compression and make other improvements. If not changed, the permanent storage is a GlusterFS file system, but CleverSafe can also be used.

You use the Db2 Event Store APIs to query the data using SparkSQL. These APIs talk to one of the Event Store engine processes which gathers the data to satisfy the query. The engine gathers data from both the cashed logs and the compressed Parquet files. This means that it inspects data that is available as of the instant the query arrives. Other applications that process queries like BigSQL only read data that has been written to the Parquet files.

Db2 Event Store does not enforce referential integrity or other constraints. However, primary keys can be indexed for fast look ups. Indexes are formed asynchronously to avoid insert latency.

Components and Installation

Db2 Event Store comes with several components. They are all conveniently packaged into a Docker image whose containers are configured to run across a cluster of servers. There is a free Developer Edition for a single workstation install that is deployed in one container. Kubernetes is also used for automating deployment, scaling and management of the containers. Etcd is used to persist the cluster state and store metadata about cluster services deployment and health.

While there are several components included, they are prepackaged in the Docker images ready to use. You don’t have to install them, get them all configured and working together. That is already done for you. IBM provided components include Data Science eXperience (DSX) that makes it easy to create tables, insert rows and query; plus IBM Data Platform Manager that provides a monitoring and management dashboard. IBM provides free Jupyter Notebooks with examples of table creations, API calls and queries that you can access immediately upon starting your Event Store Container. Open source components include Spark, ELK, Alluxio, Zookeeper and Prometheus. Storage components include GlusterFS, Cloudant, Swift ObjectStore and Elastisearch. Again, they are all packaged and ready to begin using.

The Developer Edition install page contains prerequisite information and install instructions. It is in the knowledge center along with the Enterprise Edition instructions. Download the Developer Edition here. Once you download the Db2 Event Store Docker Container, you can explore it through your web browser. Instructions for connecting are in the Db2 Event Store Knowledge Center. Make sure to open the Table of Contents link under the title. Db2 Event Store gives you access to the Data Science eXperience and other aspects of the database. Please see the short demo by Jacques Roy.

***

This is quite a new and interesting technology. If you try it, please share your experience by posting on my Facebook Page or my db2Dean and Friends Community along with any other comments you may have.

HOME | Search