Tuesday, January 31, 2012

Got LOBs? Get DB2 10 for z/OS (Part 1)

In my previous entry, I provided information pertaining to the amount of data that can be stored in a LOB column of a mainframe DB2 table (LOBs being DB2 data values that can exceed 32 KB in length). I made brief references in that entry to significant enhancements in LOB data management capabilities delivered via DB2 10 for z/OS. I'll describe those enhancements -- the most significant since LOB support was introduced with DB2 Version 6 -- in a multi-part entry, of which this is part 1. Herein I'll cover my favorite of the DB2 10 LOB features: LOB inlining. In part 2 (to be posted within the next week or so), I'll get to my second favorite item of DB2 10 LOB functionality, that being support for variable-blocked spanned records in the SYSREC data set used for LOAD and UNLOAD utility jobs (that may sound like a yawner, but it's a big deal -- trust me).

LOB inlining has a particular appeal for me because it provides in the DB2 for z/OS engine a capability that some colleagues of mine implemented in program code about eight years ago, when I was working in the IT department of a mainframe DB2-using company. Back then, we were planning for the deployment of a new application that would store e-mails and associated attachments (received from people who were consumers of our services) as LOBs in our DB2 database. [E-mails would be stored as CLOBs (character large objects), and attachments (which could be screen shots) as BLOBs (binary large objects).] We came up with a preliminary estimate of the disk space we'd need to hold the e-mail data, and it was rather alarming. The big storage requirement existed because of two factors. First, LOB table spaces can't be compressed as can base table spaces (prior to DB2 10, LOBs, while logically appearing to be stored alongside non-LOB data in a table, had to be physically stored in table spaces that were distinct from base table spaces). Second, while a LOB value can span multiple pages in a LOB table space, any one page in a LOB can hold only one LOB value (thus, if the last part of a LOB value occupies 1000 bytes of a 32 KB page in a LOB table space, the other 31,000 bytes of that page can't be used for another LOB value).

Fortunately, our DB2 DBA team came up with a clever idea that slashed our e-mail data storage space needs: a "switch" in the application code (akin to a switch on a train track) that sent e-mails with a length of 8 KB or less to a VARCHAR column in a table, and larger e-mails to a CLOB column in the same table (most e-mails we received did not have attachments, so we were fine with storing all attachments in a BLOB column). That 8 KB threshold was a "sweet spot" value for this particular application, in that the large majority of the e-mails we received back then could be stored in 8 KB or less. I'll return to this "sweet spot" notion later on.

Did the "switch on the track" solution increase application code complexity? Yes, but the disk space savings achieved justified the extra programming effort.

Along comes DB2 10 (running in new-function mode), and the coding of a switch on an application track is no longer needed for disk-efficient LOB storage. Why? Because DB2 takes care of that for you. How? By way of a feature called LOB inlining. This new capability allows you to specify, in a CREATE TABLE statement (or an ALTER TABLE), the portion of a LOB column value that is to be physically stored in the base table space along with the table's non-LOB data. Any part of a LOB value past the specified inline length is stored in a LOB table space.

Now, our coding of application-based LOB inlining functionality way back when was aimed at disk space savings, and DB2 10's inline LOB capability certainly gives you that (inlined LOB data in a base table space can be compressed, and a base table space page, unlike a LOB table space page, can hold more than one inlined LOB value), but storage efficiency isn't all you get: inlining LOBs can also boost -- sometimes dramatically -- the performance of application programs and utility jobs (e.g., LOAD) that process LOB data.

This potential for performance improvement doesn't come just from the fact that inlined LOB values can be retrieved from a base table (versus having to be accessed in a separate LOB table space). You could also get a nice program run-time reduction by inlining a portion of a LOB column and and building an index on an expression (a capability introduced with DB2 9 for z/OS) that would, for example, use the SUBSTR built-in scalar function to extract and index the characters in positions 10 through 20 at the beginning of each LOB value (that might be a contract number in a standard form). Think of the possibilities here.

To get the full benefit of LOB inlining, you need to make some smart decisions. Key among these is the decision on the amount of data in a LOB column that is to be stored in a table's base table space. You might think that this is just a matter of going with a 32 KB base table space page size and inlining the maximum amount of LOB data possible (that would be, for example, 31 KB of a row's LOB data if the row's non-LOB data took up 1000 bytes of space in a 32 KB base table space page). In fact, that would NOT be a good one-size-fits-all approach, because inlined LOB data will cause a base table space to get larger, and that will impact the read hit ratio in the buffer pool to which the base table space is assigned. Typically, the ideal, in terms of a LOB inline length specification, is what I mentioned before: a "sweet spot" threshold that's large enough to enable complete inlining of a majority of the values stored in a LOB column, and small enough to allow for the most effective use of the base table space's buffer pool resource. What would that value be for you? 7 KB? 12 KB? 22 KB? That depends on the characteristics of the LOB data with which you're dealing. If an inline length of 5K would result in the complete inlining of 55% of the values in a LOB column of a table, and a 7 KB inline length would get you to 80% with respect to complete inlining of LOB values, adding those extra 2000 bytes to the inline length specification for the table of interest would probably be a good move.

Speaking of LOB data characteristics, note that inlining is likely to have little, if any, beneficial effect on application and utility performance if most values in a LOB column can't be completely inlined (that is, if most values would be split across the base table space and the associated LOB table space). I will say, however, that in such a case it could be advantageous (performance-wise) to define an index on an expression built on the inlined portion of the values of a LOB column. Keep in mind, too, that even if a majority of the values that would go into a LOB column of a table could be completely inlined, programs might not run faster -- and, indeed, might even run a little slower -- if the LOB data is rarely retrieved (this because, as mentioned previously, inlined LOB data will make a base table space larger, perhaps negatively impacting the buffer pool hit ratio for programs accessing non-LOB data in the table).

Without a doubt, you should give serious thought to the use of DB2 10 inlining for your LOB data. If you're not yet storing LOB data in your DB2 for z/OS database, perhaps the inlining capability would make DB2 a good choice for LOB data storage in your enterprise. Remember, the key to optimal inlining is knowledge of your LOB data and of the applications (and utilities) that process that data.

Want more information about LOB inlining? Check out section 4.3, "Inline LOBs," in the IBM "red book" titled, "DB2 10 for z/OS Performance Topics" (downloadable from the Web -- see http://www.redbooks.ibm.com/abstracts/sg247942.html?Open).

Next up, within a week or so, part 2 of this multi-part entry, in which I'll focus on the support provided by DB2 10 for the variable-blocked spanned record format for the SYSREC data set of LOAD and UNLOAD utility jobs -- a great enhancement from a LOB data management perspective.

2 comments:

  1. A scream into the wilderness ... I have successfully used SPANNED to unload/reload BLOB data but cannot persuade any other method to work. Regardless of whether I'm unloading or loading and regardless of where I store the BLOB data (HFS, PDS, PDSE, etc.) I get the same old SQLCode -452 Reason 13. *** HELP! ***

    ReplyDelete
  2. I hope that you've taken this up with the IBM support center since posting your comment. I'm afraid that I don't have any suggestions here - dealing with reason code 13 for the -452 SQL error code generally requires that you find the associated "system" completion code, which you might get from the console.

    ReplyDelete