Thursday, March 31, 2016

Observations on DB2 for z/OS Address Space CPU Utilization

Have you ever looked at a DB2 Statistics Long report generated by your DB2 monitor? If so, you might have seen a section containing information about the CPU consumption of the various DB2 address spaces. It would look something like the example below, which reflects activity over a one-hour period of time (similar information might be available as well via an online display provided by your DB2 monitor). Note that I’ve abbreviated address space names and reduced the precision of CPU time figures (from microseconds to milliseconds) to enable the information to fit within the width of this blog space.

CPU TIMES  TCB TIME  PREEMPT SRB  NONPREEMPT SRB  PREEMPT IIP SRB
---------  --------  -----------  --------------  ---------------
SYS SVCS     23.203    13:18.791          10.464           55.127
DB SVCS      58.468     1:26.687          17.225        13:49.714
IRLM          0.006        0.000        6:00.679            0.000
DDF           5.741     9:36.358           2.596        12:23.363


An address space's total CPU time for the reporting interval is the sum of the numbers in the corresponding row of the tabular display; so, for example, the total CPU consumption for the DB2 database services address space, based on data shown above, is:

58.468 + 1:26.687 + 17.225 + 13:49.714 = 16:29.094 (16 minutes, 29.094 seconds)

The first column of numbers, labeled TCB TIME, shows the CPU consumption of address space processes represented in the system by TCBs, or task control blocks. TCB CPU time is always consumed on general-purpose processors (aka general-purpose "engines"). The second column, labeled PREEMPT SRB, shows the CPU time, for address space processes represented by preemptible SRBs (Service Request Blocks), that was consumed on general-purpose engines. Work done under preemptible SRBs is generally zIIP-eligible (i.e., eligible for execution by a zIIP engine - a specialty processor that provides relatively lower-cost MIPS), to varying degrees depending on the type of work being done. Work, done under preemptible SRBs, that is not executed by zIIP engines is executed by general-purpose engines. The third column, labeled NONPREEMPT SRB, shows CPU consumption of address space processes represented by non-preemptible SRBs. This work, like work done under TCBs, is always executed by general-purpose engines. The fourth column, labeled PREEMPT IIP SRB, shows the CPU time, associated with processes running under preemptible SRBs, that was consumed on zIIP engines.

In this blog entry, I want to point you towards some observations about DB2 address space CPU utilization figures that you might see for DB2 subsystems at your site.

IRLM - lean and mean

The IRLM address space - responsible for lock management in a DB2 for z/OS environment - typically consumes a very small amount of CPU resource, even when the volume of lock and unlock request activity is very high. For the system from which the CPU times shown above came, the six minutes of IRLM CPU time fueled execution of 128 million lock and unlock requests during the one-hour reporting interval. This great CPU efficiency is a key reason why you shouldn't hesitate to put IRLM where it belongs, priority-wise, in a z/OS LPAR's WLM policy: in the super-high-priority SYSSTC service class. IRLM doesn't use much CPU, but when it needs an engine it needs one RIGHT AWAY; otherwise, lock acquisition and release actions are delayed, and the whole DB2 workload slows down. [By the way, I would not recommend assigning DB2 address spaces other than IRLM to the SYSSTC service class. The other DB2 address spaces - MSTR, DBM1, DIST, and any stored procedure address spaces - should all be given the same priority, and that priority should be below SYSSTC and above address spaces, such as CICS regions, in which application programs run.]


System services (aka MSTR) - the thread factory

The DB2 systems services address space handles various functions. Of these, a major driver of CPU utilization is thread creation and termination. On the DB2 system for which address space CPU times are shown above, the dominant component of the workload during the reporting interval was CICS-DB2 (a little over 500 transactions per second, on average, during the one-hour time period). For this DB2 workload component, it so happens that there was very little in the way of CICS-DB2 thread reuse (the thread reuse rate was about 2%). That being the case, with the high transaction rate the MSTR address space was kept pretty busy creating and terminating hundreds of CICS-DB2 threads per second. If the CICS-DB2 thread reuse rate were to be made considerably higher through the use of a few protected entry threads for the most frequently executed transactions, it's likely that the CPU time for the DB2 system services address space, which was 14 minutes and 47.585 seconds for the system portrayed in the report snippet above, would have been a considerably smaller value.

Database services (aka DBM1) - readin' and writin'

Like the system services address space, the DB2 database services address space performs a variety of functions. Two functions that account for a lot of the address space's CPU consumption are prefetch reads and database writes. Prefetch read operations (referring to the combined total of sequential, list, and dynamic prefetch reads) often greatly outnumber database writes - sometimes by 10 to 1, sometimes by 20 to 1 - in a transactional application environment, as transactional work is often read-heavy (batch workloads are sometimes relatively more write-heavy), so your main leverage point in reducing DBM1 CPU consumption will typically be reducing the rate of prefetch reads in the system. That goal, in turn, is generally achieved via enlargement of buffer pools that see a lot of prefetch reads. Note that the bulk of DBM1's CPU consumption is associated with zIIP processors (in the report snippet above, that's the 13 minutes and 49.714 seconds seen for the database services address space in the column with the heading PREEMPT IIP SRB - about 84% of DBM1's total CPU time). This is so because, starting with DB2 10 for z/OS, prefetch read and database write operations became 100% zIIP-eligible. Because reducing prefetch reads will reduce zIIP engine utilization, does that mean it's not important? No, it doesn't mean that. Reducing zIIP engine utilization can be important, especially as a means of avoiding zIIP engine contention problems.

DDF - SQL-driven

The DDF address space (also known as DIST) is interesting in that its CPU consumption is largely driven by execution of SQL statements that get to DB2 by way of DDF. Referring to the report snippet above, the CPU times for DDF in the TCB TIME and NONPREEMPT SRB columns - about 8 seconds of the address space's total of a little over 22 minutes of CPU time - reflect activity performed by DDF "system" tasks. The rest of the DDF CPU time, consumed by processes represented by preemptible SRBs, is associated with execution of SQL statements issued by network-attached DB2-accessing applications (and that includes sending query result sets back to clients). The more SQL that gets to DB2 through DDF, the higher DDF's CPU consumption will be (just as the CPU time of a CICS region is affected by the cost of executing SQL statements that get to DB2 via that address space). Here's something else to note: the CPU time split between general-purpose engines and zIIP engines for DDF work done under preemptible SRBs. Using numbers from the report snippet above, you can see that this split is about 56% zIIP and 44% general-purpose-engine time (the figure for the zIIP offload percentage for the DDF address space is the time under the PREEMPT IIP SRB column for DDF divided by the sum of the times in the PREEMPT SRB and PREEMPT IIP SRB columns). Execution of SQL statements running under preemptible SRBs in the DDF address space is up to 60% zIIP-offload-able, and I'd say that a zIIP offload percentage in the 55-60% range is good. If you see a split such that the DDF CPU time associated with work done under preemptible SRBs is less than 55% zIIP time (i.e., if PREEMPT IIP SRB time for DDF divided by the sum of PREEMPT SRB and PREEMPT IIP SRB time is less than 55%), check to see if you have a zIIP engine contention issue (check the blog entry on zIIP engine conetntion pointed to by the hyperlink above).

So, looked lately at DB2 address space CPU times in your environment? Check 'em out, and see what conclusions you can draw. I hope that the information provided via this blog entry will be useful to you.