Friday, January 29, 2016

DB2 for z/OS: Thoughts on History and Archive Tables

I'll state up front that I'm using the terms "history table" and "archive" table not in a generic sense, but as they have technical meaning in a DB2 for z/OS context. A history table is paired with a "base" table that has been enabled for system-time temporal support (introduced with DB2 10 for z/OS), and an archive table goes with a base table that has been enabled for DB2-managed archiving (also known as "transparent archiving" -- a feature delivered with DB2 11). Last week, I posted to this blog an entry that explored the use of system-time temporal and DB2-managed archiving as alternative solutions for different data archiving scenarios. Today I want to draw attention to some considerations related to history and archive tables.

Actually, it's not so much history and archive tables themselves that are on my mind. Tables are an embodiment of logical database design. What I'm thinking about is physical database design. You see, history tables and archive tables have to be logically identical to their associated base tables -- same columns, in the same order and with the same data type. Must history and archive tables be identical to the associated base table in a physical sense? NO. That physical-difference flexibility is something that a DB2 DBA should definitely have in mind as he or she contemplates enabling system-time temporal support or DB2-managed archiving for a base table.

Here's an example of what I'm talking about: suppose that a table you want to enable for system-time temporal support or DB2-managed archiving resides in a traditional segmented table space. Does that mean that the associated history or archive table has to reside in such a table space? NO. It could be that a universal range-partitioned or partition-by-growth table space would be a better choice for the base table's history or archive table (that 'or' means that a single table cannot be enabled for both system-time temporal support and DB2-managed archiving -- it's one or the other, and my previous blog entry was intended to help you make that choice). Think about it. There's a good chance that a history or an archive table will end up being a lot larger than its associated base table, and partitioned table spaces can hold a very large amount of data (up to 128 TB).

Indexes are another aspect of physical database design that can be different, in a number of ways, for history or archive tables versus their associated base tables. Is there an index on a base table that would not be so useful (i.e., that would not have benefits in excess of its CPU and disk space and DBA-time costs) for an associated history or archive table? Fine -- don't define that index for the history or archive table. Similarly, would an index NOT defined on a base table be useful for an associated history or archive table? Fine -- define it for the history or archive table. Even for an index you want on both a base table and its associated history or archive table, should the index page size be different? Should index compression be used or not used? Should data rows in a history or archive table be clustered differently versus rows in the associated base table?

And you can go right on down the line with other aspects of physical database design. It might make sense to assign a base table's table space to a PGSTEAL(NONE) buffer pool, but would that be a good choice for the table space of an associated history or archive table? If LOCKSIZE(ROW) is needed for a base table's table space, would page-level locking be more appropriate for an associated history or archive table? If hash organization of data delivers performance benefits for a base table, would cluster-organized data work out better for the associated history or archive table?

You get the idea. What you want to do, in considering enablement of system-time temporal support or DB2-managed archiving for a base table, is keep in mind that LOGICAL equivalence of the base table and its history table, or the base table and its archive table, does not require PHYSICAL equivalence between these tables. Think about the physical database design characteristics that would be optimal for the base table, and determine whether those or DIFFERENT physical database design characteristics would be right for the associated history or archive table.

Thursday, January 21, 2016

DB2 for z/OS-Managed Archiving, or System-Time Temporal?

Recently I had the opportunity to talk to a group of application developers and DBAs about the temporal data support that was introduced with DB2 10 for z/OS. One of the meeting attendees asked an interesting question: for data-archive purposes, should one use system-time temporal functionality (by which DB2 retains "before" images of rows that are affected by update or delete operations), or the DB2-managed archiving capability delivered with DB2 11 for z/OS?

At first glance, that might look like a question with an obvious answer: because the question is about data archiving, the answer should be, "Use DB2-managed archiving," right? In actuality, one really could go either way, depending on an organization's requirements regarding the data in question. In certain circumstances, system-time temporal functionality can be an appropriate data archiving mechanism, despite the fact that the word "archiving" is not part of the formal name of that DB2 feature.

Before proceeding further, I'll provide a brief description of DB2's system-time temporal support (DB2-managed arching is described in the blog entry to which I provided a hyperlink, above). System-time refers to one of two "flavors" of DB2 temporal data functionality (the other flavor, known as business-time temporal, provides a means whereby future data changes -- such as a price adjustment for a product or service -- can be reflected in a DB2 table's data without affecting programs that, by default, access data that is in effect, from a business perspective, now). When a DB2 table is enabled for system-time temporal support (either when it is created, or, for an existing table, via ALTER operations), it has a couple of extra columns, both of the timestamp variety (there is actually a third added timestamp column, called a "transaction-start ID" column, that appears to be largely unused at present but may get a role to play in a future release of DB2 for z/OS). These two columns indicate the time at which a row became current (by being inserted into the table, or by way of an update operation targeting an existing row), and the time at which the row became non-current (by being deleted, or changed via update -- rows that are "currently current" have a high value of midnight, December 31, 9999 in the "time the row became non-current" column).

A table enabled for system-time temporal functionality, in addition to having "row became current" and "row became non-current" timestamp columns, also has an associated table called a history table. The history table holds the "before" images of rows that were either deleted from the "base" table or were changed in the base table via update; so, all the rows in the history table are non-current, and all the rows in the base table are "currently current." If a row for product XYZ is inserted into a system-time temporal-enabled PRODUCTS table at 9:27 AM on January 22, 2016, that timestamp (which will in fact have picosecond precision) will go into the row's "became current" column (which is actually called the "row begin" column), and the "became non-current" column (officially, the "row end" column) will contain the high value for a timestamp (as previously mentioned, that's midnight on December 31, 9999). If the price of product XYZ is changed from $10 to $15 via an UPDATE at 3:30 PM on March 10, 2016, the row in the base table will show the $15 price, a "row begin" time of 3:30 PM on March 10, 2016 (the time of the UPDATE), and a "row end" time of the previously noted high-timestamp value. DB2 will place in the base table's history table the "before" image of the row (before the update, that is), with a price of $10 (the pre-UPDATE price), a "row begin" time of 9:27 AM on January 22, 2016 (the time the row was first inserted in the base table with the $10 price), and a "row end" time of 3:30 PM on March 10, 2016 (the time of the update that changed the price for product XYZ to $15).

Here's what's really cool about this system-time temporal thing: through extensions to SQL that accompanied the introduction of temporal functionality, I can (or a program can) ask DB2 this question: "What was the price of product XYZ on February 17, 2016?" DB2 will retrieve information from the row that was current as of the date specified, and provide the answer: $10. Importantly, that temporal query does not have to reference the history table associated with the system-time temporal-enabled base table -- it just references the PRODUCTS table (the base table in this example), and DB2 takes care of looking for the row, if needs be, in the history table that has been paired with the PRODUCTS table. Besides allowing data retrieval on a prior POINT in time basis, DB2 system-time temporal support allows data retrieval on a prior RANGE of time basis -- a program or user can ask DB2, "Show me any and all rows for product XYZ that were current for at least some time between December 1, 2015 and April 30, 2016." DB2 will retrieve that row or rows (in our example, it would be two rows for product XYZ -- one from the base table and one from the history table), with again no need for the history table to be referenced in the query. By examining the values in the "row begin" and "row end" columns of the retrieved rows, one can see how data for (in this example) a product changed, and when those changes were effected.

What system-time temporal support and DB2-managed archiving have in common is the notion of a "single logical table" that in fact consists of two physical tables (a base and a history table, in the case of system-time temporal, and for DB2-managed archiving a base and an archive table). In both cases, application programs do not have to reference the "associate" table. Data access references are to the base table, and DB2 takes care of pulling in data from the "associate" table, as needed.

Back now to data archiving. Various DB2 for z/OS-using organizations have seen system-time temporal support as a data archiving solution, and they are not wrong in looking at it that way. The choice between system-time temporal and DB2-managed archiving (and an either-or choice it is, because these capabilities cannot both be used with a single base table) comes down to the nature of data in a table that an organization wishes to retain over some long period of time (and long-term retention of historical data is my simple conception of archiving). It may be that data in a table, once inserted, is never updated, and that rows deleted are either discarded completely (thrown in "the bit bucket," if you will) or moved to offline media. In that case, if interest is only in rows that are "currently current" (i.e., still valid and true, even if quite old), and if rows have to be kept for a long time, and there is a desire to physically separate older from newer rows to boost access performance for the newer rows (the assumption here being that newer rows are the ones most frequently accessed), DB2-managed archiving offers a nice solution that combines optimal performance for newer-row retrieval with large-capacity historical data retention, without complicating SQL coding for queries that might need to access older as well as newer rows (thanks to the fact that, as noted, programs need only reference the base table, even if the associated archive table needs to be searched for data). Even if rows in the table are updated after having been inserted, DB2-managed archiving can be a good historical data retention solution if there is interest only in currently-valid rows -- if there is no interest in what updated rows "used to look like" before they were updated.

What if, on the other hand, there is an interest in what rows looked like at some prior point in time, even if the rows look different now because of updates, or even if the rows were deleted from the base table? If the need to retain large amounts of historical data includes a requirement -- or at least a desire -- to maintain accessibility to non-current (i.e., not in effect now), "prior versions" of rows made non-current through DELETEs and/or UPDATEs, DB2's system-time temporal functionality (sometimes referred to as "row versioning") could be just the ticket.

System-time temporal data support and DB2-managed archiving can both be thought of as data archiving solutions, even though only the latter feature has "archiving" in its name. The decision to use one or the other will depend on an organization's requirements for data in a given table, and one DB2-using company might well make use of both solutions for different tables (keeping in mind that a single table cannot be both archive-enabled and system-time temporal-enabled). Remember that system-time temporal functionality is about holding onto "what was true" data associated with a table, even as "what is true" is changed by way update and/or delete operations. DB2-managed archiving is the right tool for the job when the only data to be retained is of the "is true now" variety, and when a large quantity of historical data and programs' propensity to access fairly "new" data make physical separation of older and newer rows -- without complicating data access logic -- advantageous from a performance perspective.

In about a week or so, I'll post a companion piece to this entry, with some matters for DB2 DBAs to consider as they ponder physical database design options for history tables (used with system-time temporal-enabled base tables) and archive tables (used with archive-enabled base tables).