BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Articles How to Provide SQL Access to NoSQL Type Data using Multi-Record Type

How to Provide SQL Access to NoSQL Type Data using Multi-Record Type

Introduction

The database market is innovating rapidly with advancements in web access, mobile devices, reporting and analytics packages, and more. Yet, a surprising number of systems are still unable to fully participate in these achievements because their data is not organized in a relational manner. Without an efficient way to access data through relational APIs (e.g., SQL, ODBC, JDBC, PHP, ADO.NET, etc.), many systems cannot efficiently employ these new innovations.

This article will present a solution to one particular challenge: systems that mix different record layouts in the same file and then use a field to tell the application how to interpret each record. If you were trained in modern relational database techniques, this may sound strange to you. You were taught to follow a rigid schema, normalizing data by dividing it into separate tables. Business software has not always been written that way. There was a point in the not too distant past when the number of available file handles was limited at the system level. For one example, recall the days of DOS when one was limited to 20 file handles, and stdin, stderr, stdout, etc., consumed some of them.

While the mixed record layouts may have been appropriate for non-relational or NoSQL APIs where a schema definition was not required, developers are often confronted with needs for leveraging technologies that require a schema. A consistent schema definition is necessary for any software that uses SQL (or ODBC, etc.), such as a variety of third-party tools for reporting and analytics, including Crystal Reports and Excel, and many web and mobile device applications.

When a business is confronted with a requirement for SQL access to this type of data, they often resort to tricks like running batch programs to extract the data, transform it into a relational format, and load it into SQL database tables. These batch programs may have to run overnight, so the exported data is rarely up-to-date. Another option is to add a SQL API which requires breaking out each record layout into its own physical file, which leads to risky changes to the existing application code. Another alternative is to rewrite the entire application. Often times, this code has been working for years or even decades without change, so the less invasive the code change, the better.

To avoid these problems, and to enable developers to map existing proprietary, non-relational data to relational data, the FairCom c-treeACE® database introduced support for Multi-Record Type (MRT) Tables. The support for MRT Tables provides a way of dealing with data that does not fit into a single SQL schema. MRT Tables are for use with existing data that was not stored in relational tables at the onset. While it is doubtful that a developer would create a new application based on MRT Tables today, any developer dealing with the type of legacy applications described earlier will find them very valuable. Think of the batch programs mentioned earlier as a type of ETL (extract, transform, load); think of MRT Tables as a type of “virtual” ETL (extract, transform, and create a set of virtual tables that can be directly accessed via SQL in real-time). The magic of the FairCom technology is that the transformation is done on-the-fly without copying into external tables so that data is always up-to-date.

This article provides an example of how to use the MRT Tables to provide concurrent NoSQL and SQL access to non-relational c-treeACE data in real-time. It is intended for Application Developers and Application Administrators that are looking for an easy mechanism for gaining SQL access to c-treeACE data, or applications that might be easily migrated to c-treeACE.

c-treeACE

Before diving into the detail, let’s start with a brief introduction to FairCom’s c-treeACE database, which is a high-performance NoSQL + SQL database engine, offering many traditional database concepts at the NoSQL API level, such as full ACID properties through transaction processing, roll forward, roll back, record and key-level locking, deadlock detection, hot backups, and automatic disaster recovery, to name a few. Because of the way these database fundamentals are implemented at a NoSQL level, c-treeACE makes it possible to concurrently access the same data files from a variety of NoSQL record-oriented APIs and relational APIs.

The following diagram shows the APIs and frameworks c-treeACE supports:

Figure 2 - The NoSQL APIs include the ISAM API, a native Java implementation on top of ISAM called JTDB, a .NET implementation on top of ISAM called c-treeDB for .NET, and several others. The SQL APIs include SQL, JDBC, ODBC, PHP, ADO.NET, etc.

As we’ll explore below, this ability to concurrently access data from both the NoSQL and SQL worlds provides benefits for accessing data that is not in a relational format. In this article we will use standard SQL commands to access the data as a means of demonstrating SQL access to the non-relational data. In the real world, any application that uses industry-standard SQL can be used to access the data. It also provides opportunities to fine tune the data access for performance gains over a SQL-only database engine, but that is a topic for another article.

The example code provided in this article uses the NoSQL c-treeDB C API. c-treeDB provides a higher level API on top of the ISAM API, which saves the programmer time and effort by providing routines for buffer management, establishing sessions, and combining application data and index files into a database. For over 30 years, FairCom has provided this type of underlying database technology for vertical market applications, corporate-wide applications, and embedded processes used world-wide. c-treeACE is an engineering-level solution for high-speed data indexing and extraction. This technology is in use in everything from the demanding work of processing credit-card transactions by Visa to use by NASA in outer space.

Multi-Record Type Support

Within c-treeACE, an MRT Table can be used when the record structure varies from record to record depending on some criteria. In the example we will look at, the data simulates the type of legacy ERP application described earlier in which four types of unrelated NoSQL records were combined into a single table (called tutorial_host):

  • Customers
  • Orders
  • Inventory items
  • Items ordered

The actual table containing the data records (tutorial_host in this example) is called the “host.” A single host table can have multiple MRT Tables, one MRT Table for each type of record. All the records in a single MRT Table will be represented with the same record schema.

We will define rules to determine which host records belong to each MRT Table. In this example, the host table contains a field called rectype that indicates which type of data is contained in each record, so we will base our rule on that field. The diagram below shows how we will use the rectype field to break our four record layouts into their own virtual MRT Tables:

  • CustMast (rectype=C) – Records that belong to a master list of customers
  • CustOrder (rectype=O) – Records that list customer orders
  • ItemMast (rectype=M) – Records that belong to a master list of items
  • OrderItem (rectype=I) – Records that list of items sold to individual customers

Remember, this mapping is done by c-tree in real-time as the records are accessed. There is no need to copy the data into separate SQL tables. The MRT Tables appear to be real SQL tables to the applications accessing them, but they are actually “virtual” tables generated on-demand.

The host table requires a common schema to be defined for all records, although this schema may describe only the first part of the record: the part that all records have in common. In this example, the rectype field is our common field. A filter identifies records belonging to a particular MRT Table based on this field. The c-treeDB API provides the means to define the rule for filtering, creating the MRT Tables, and defining the various schemas.

An MRT Table is operated upon as an ordinary table using the c-treeDB API. The c-treeDB ctdbCreateMRTTTable() function behaves much like the ordinary ctdbCreateTable() function; however, it creates an MRT Table. Once defined, other c-treeDB commands (ctdbWriteRecord(), ctdbFirstRecord(), ctdbGetFieldAsString(), etc.) can be used on the MRT Table as if it were a standard table. When listing tables in a c-treeDB dictionary, MRT Tables are listed together with regular tables as the example screen shots in the next section will show.

Sample Record Layouts

Let’s consider our example where we have four unique record layouts all stored in a single physical non-relational file called tutorial_host.dat. When looking at the following screen shots, you’ll see what appear to be five tables. The only physical table is tutorial_host. The other four entries listed under the Tables heading (custmast, custordr, itemmast, and ordritem) are all Virtual Table representations of four unique record layouts stored within tutorial_host. These Virtual Tables were created, on-the-fly, by the c-treeACE SQL Engine using the MRT support. Before we get into the detail though, let’s further explore our example program.

First is the Customer Master record layout (labeled “custmast” in the screen shot), which contains records identified with a rectype field value of “C”:

Next we have record type Customer Order (“custordr”), which contains records with a rectype of “O”:

Then we have Item Master (“itemmast”) with a rectype value of “M”:

Last we have record type Order Item (“ordritem”) with a “rectype” value of “I”:

To illustrate our point, all four unique records types, comprising 14 total records, are stored in a single physical file called tutorial_host.dat, as the following screen shot of our local disk drive shows:

NoSQL APIs, such as the c-treeACE ISAM API and the c-treeDB C API used in this article are able to deal with this type of coding practice with ease, and avoid risky changes to existing application code. Because this code has been working for years or even decades without change, it is best to avoid making invasive code changes when possible. As mentioned earlier, the alternative is often to export the data to external SQL tables, which has its own problems.

To avoid these problems, c-treeACE users who have this type of multiple-record layout in a single file can use MRT Table support to provide SQL API access for these types of files.

Example

Now let’s look at an example of MRT Tables in action. The attached example code follows the standard format of all c-treeACE tutorials:

  1. Initialize -- Perform the minimum requirement of logging onto the c-treeACE SQL Server.
  2. Define -- Create and open the c-treeACE data file called “tutorial_host.dat” which has the four MRT Tables (e.g., unique record types).
  3. Manage -- Populate the tables and perform a simple query. This is where we see the MRT Tables in action. We will access the data using record-oriented commands and then access the same data using SQL as though we had four different tables.
  4. Done -- Handle the housekeeping of closing tables and freeing associated memory.

Initialize()

We begin by performing the minimum requirement of logging onto the c-treeACE Server using the following c-treeACE CTDB functions:

  • ctdbAllocSession(CTSESSION_CTDB) -- Allocates memory and initializes a new session handle, setting the default attributes.
  • ctdbAllocDatabase(hSession) -- Allocates memory and initializes a new database handle.
  • ctdbLogon(hSession, "FAIRCOMS", "", "") -- Logs on to the c-treeACE database Server, where “hSession” is the session handle, “FAIRCOMS” is the c-treeACE Server name, and the third and fourth parameters are the user ID and user password respectively. The "" indicates use the default user ID of ADMIN with the default password of ADMIN.
  • ctdbConnect(hDatabase, "ctreeSQL") -- Connects to the database, where “hDatabase” is the database handle and “ctreeSQL” is the database name.

Define()

Next we define the tables. Because this is a self-contained example, we will create tables and populate them. In the case of a legacy application, the data would already exist, so we would still need to define MRT Tables, but we would not have to populate them.

To do this, we attempt to open the tables to see if they already exist. If they do not exist, we create and open them. The example code provides some high-level functions to create the tables we will be using:

Create_HOST_Table();
Create_CustomerMaster_Table();
Create_CustomerOrders_Table();
Create_OrderItems_Table();
Create_ItemMaster_Table();

These functions use the standard c-treeDB function ctdbOpenTable() to try to open the tables. If the table does not exist, it creates it using ctdbCreateMRTTable(). This function is similar to the standard ctdbCreateTable() function, except it creates an MRT Table based on a host (parent) table. Once created, an MRT Table behaves as a regular c-treeDB table.

  • ctdbCreateMRTTable(CTHANDLE Handle, pTEXT VTableName, pTEXT ParentName, CTCREATE_MODE CreateMode, pTEXT filter) -- Creates a new MRT Table based on a host table with the following parameters.
    • Handle: MRTTable handle
    • VtableName: name of the MRTTable
    • ParentName: name of the host (or parent) table
    • CreateMode: Virtual Table create mode
    • filter: the filter to apply to the host table to identify the record belonging to this MRTTable

We then open the tables using:

  • ctdbOpenTable(CTHANDLE Handle, pTEXT TableName, CTOPEN_MODE OpenMode) -- Opens an existing table, given its name.
    • Handle: the table handle.
    • TableName: the table name.
    • OpenMode: the table open mode.

Manage()

Having created the tables, we populate them and perform a simple query. This section calls functions, provided in the example, to add data to the tables:

Add_CustomerMaster_Records();
Add_CustomerOrders_Records();
Add_OrderItems_Records();
Add_ItemMaster_Records();

Those functions use the standard ctdbWriteRecord() function to place the data in the tables.

  • ctdbWriteRecord() -- Adds a new record or update an existing record.
    • Handle: the record handle.
Record-Oriented Access to the Data

We have included an example of record-oriented access to the data to simulate the type of programming found in existing NoSQL-based applications. To demonstrate, we perform a query of that data using record-oriented functions:

  • ctdbFirstRecord() -- Retrieves the first record in a table. The ordering of the records is done through one of the indices that were defined during the table creation.
    • Handle: the record handle.
  • ctdbGetFieldAsString() -- Retrieves a field as a CTSTRING value.
    • Handle: the record handle.
    • FieldNbr: the field number to be retrieved. To retrieve the field number given the record handle, use ctdbGetFieldNumberByName().
    • pValue: the pointer to the string value.
    • Size: pValue size in bytes.
  • ctdbGetFieldAsSigned() -- Retrieves a field as a signed value.
    • Handle: the record handle.
    • FieldNbr: the field number to be retrieved. To retrieve the field number given the record handle, use ctdbGetFieldNumberByName().
    • pValue: the pointer to the signed value.

SQL Access to the Same Data

Because we have defined MRT Tables for our data, we can now access the same data using standard SQL queries.

  • To examine the table contents, use these SQL queries:
select * from custmast;
select * from custordr;
select * from itemmast;
select * from ordritem;

These four queries produced the record layout screen shots shown at the beginning of this article.

  • To list all orders by customer:
SELECT c.cm_custname, o.* FROM custmast c 
INNER JOIN custordr o ON c.cm_custnumb=o.co_custnumb;

  • To list all items ordered by customer:
SELECT c.cm_custname, o.co_ordrdate,oi.oi_quantity,i.im_itempric,i.im_itemdesc FROM custmast c 
INNER JOIN custordr o ON c.cm_custnumb=o.co_custnumb 
INNER JOIN ordritem oi ON oi.oi_ordrnumb = o.co_ordrnumb 
INNER JOIN itemmast i ON i.im_itemnumb=oi.oi_itemnumb;

  • To list total purchases by customer:
SELECT c.cm_custname AS name, 
SUM(oi.oi_quantity * i.im_itempric) AS total FROM custmast c 
INNER JOIN custordr o ON c.cm_custnumb=o.co_custnumb 
INNER JOIN ordritem oi ON oi.oi_ordrnumb = o.co_ordrnumb 
INNER JOIN itemmast i ON i.im_itemnumb=oi.oi_itemnumb 
GROUP BY cm_custname;

Done()

We end by handling the housekeeping of closing tables and freeing any associated memory. This process uses standard functions including ctdbFreeRecord(), ctdbFreeTable(), and ctdbFreeSession().

Conclusion

In this article, we have seen how to use c-treeACE® Multi-Record Type support to provide both NoSQL and SQL access to c-treeACE data that combines multiple schemas in a single table, thereby opening a new door for these systems to use today’s modern functionalities and devices, without making significant application changes. You can download a fully functional version of the c-treeACE Express database here and a working copy of this sample program is available by contacting FairCom Support at support@faircom.com.

About the Author

As FairCom’s vice president of Engineering, Randal Hoff helps set the technical direction for the c-treeACE product line, working with FairCom’s team of engineers to produce high performance, reliable and cost-effective cross-platform database technology that meets the stringent demands of database developers ranging from the enterprise, ISV and embedded device markets. His technical career, which spans more than 25 years, includes progressively responsible positions at FairCom and General Electric. Hoff holds Bachelor of Science degrees in Computer Engineering and Electrical Engineering from the University of Missouri, Columbia.

Rate this Article

Adoption
Style

BT