Friday, October 27, 2017

Db2 12 for z/OS: Enhanced Lock Avoidance in Data Sharing Environments

In the last entry posted to this blog, I described the mechanisms (including PUNC bits and commit log sequence numbers, aka CLSNs) used by Db2 for z/OS for lock avoidance purposes (lock avoidance here refers to a capability whereby Db2 can, in many cases, verify the committed state of a data value without having to get an S-lock - i.e., a read lock - on the page or row in which the value is located). As lock and unlock requests consume CPU cycles, lock avoidance is a CPU efficiency booster for Db2 application workloads.

Db2 data sharing (a technology through which multiple Db2 subsystems in a Parallel Sysplex mainframe cluster share concurrent read/write access to one database) has long delivered unmatched levels of system scalability and availability. An enhancement delivered with Db2 12 for z/OS can reduce the CPU cost of data sharing by increasing the degree to which locks can be avoided in a data sharing environment.

In the aforementioned blog entry on Db2 lock avoidance, I described how commit log sequence numbers (CLSNs), which are maintained for each and every Db2 page set (or partition, in the case of a partitioned object), are used in support of lock avoidance. When Db2 data sharing technology was introduced in the mid-1990s (with Db2 Version 4), its use had the effect of reducing lock avoidance because a single CLSN, called the global CLSN (or GCLSN), was used for ALL group buffer pool-dependent data sets (i.e., data sets that are the object of inter-Db2 read/write interest in a data sharing group - quite likely the majority of open data sets in the system). The GCLSN value is the log point at which the currently-still-outstanding-longest-running-data-changing unit of work  got started. The implication: if ONE data-changing unit if work is running for a long time, lock avoidance for ALL group-buffer-pool-dependent data sets is negatively impacted, because in that case the GCLSN value will go a long time without being positively incremented, and in THAT case more S-locks on pages and/or rows will have to be requested to verify the committed state of data values.

Along comes Db2 12 for z/OS, and with it an important change in how CLSNs affect lock avoidance in a data sharing environment. The change: instead of a single GCLSN value reflecting the start time of the longest running still-out-there data-changing unit of work in the whole data sharing system, Db2 12 caches, in memory and in the shared communications area (SCA) in a coupling facility LPAR, the 500 oldest CLSNs in the data sharing system. What does that mean with regard to lock avoidance? Well, let's say that you're an application program accessing data in table TABLE_A, which is in table space TSPACE_A, which is group buffer pool-dependent and has a CLSN that is one of the aforementioned 500 oldest in the system. For you, lock avoidance is just as it would be in a non-data sharing system, because Db2 is using your table space's CLSN value to determine when it can retrieve committed data values from TABLE_A without having to issue S-lock requests for pages (or rows, as the case may be) in the table.

Now, let's say you're an application program accessing data in TABLE_X, which is in table space TSPACE_X, which is group buffer pool-dependent and has a CLSN that is not one of the 500 oldest in the system. What's the deal in that case? Here's the deal: for you, Db2 will use the CLSN that is the newest of the 500 oldest that it's cached. Is that going to be good with regard to the level of lock avoidance you get? It should be really good. Think about it. In a Db2 11 (or earlier) system, Db2 would be using, for you, for lock avoidance purposes, the oldest CLSN in the system. What if that CLSN is associated with a unit of work that's been chugging along for an hour, changing data in TABLE_Y (in TSPACE_Y), without committing? That really old CLSN (the global CLSN for the system) could mean a lot of lock requests not avoided for you. In a Db2 12 environment, the CLSN associated with the unit of work that started changing data in TABLE_Y an hour ago (and hasn't committed since) affects lock avoidance only for TSPACE_Y, not for TSPACE_X, which you're accessing. Does the next-oldest CLSN in the system, which is (let's suppose) 45 minutes old, affect the level of lock avoidance you're getting? Nope. How about the 10th oldest CLSN? Nope. The 100th oldest? Nope. The 352nd oldest? Uh-uh. Your level of lock avoidance is impacted by the 500th oldest CLSN in the system. Maybe that one is associated with a unit of work that has gone only a few seconds (or less) without a commit. Think you'll get more lock avoidance with a few-seconds-old CLSN versus an hour-old CLSN? Odds of that are real good, I'd say.

And there you have it - just one of multiple ways in which Db2 12 delivers improved performance versus its predecessor. Over the next several months in this blog I'll get to some of the other performance-boosting features of Db2 12. Lots to like there, I can tell you now.