Robert's Db2 blog: November 2015

Monday, November 30, 2015

DB2 for z/OS: How I Use -DISPLAY GROUPBUFFERPOOL Output

Having recently posted a blog entry on my use of output from the DB2 for z/OS -DISPLAY BUFFERPOOL command, it seems a good time to post a companion entry focused on the -DISPLAY GROUPBUFFERPOOL command. Obviously, this latter command is relevant to people who work with DB2 operating in data sharing mode on a Parallel Sysplex. If you are such a person, read on.

First, enter the command the right way

Actually, you'll want to enter the command in two forms. First, from any one member of the DB2 data sharing group of interest, issue the following:

-DISPLAY GROUPBUFFERPOOL(*) TYPE(GCONN) GDETAIL(INTERVAL)

More specifically, issue that command twice, one hour apart, and retain the output of the second issuance of the command. Why? Because that command's output will show activity, from a group perspective, for each group buffer pool for a one-hour period of time (i.e., for the period of time between the first and second issuances of the command). In the command output you'll see, for each group buffer pool (GBP), a DSNB782I message, the text of which will include the phrase "INCREMENTAL GROUP DETAIL STATISTICS SINCE timestamp-value." You can check that timestamp value to verify that the data in the second issuance of a pair of hour-separated -DISPLAY GROUPBUFFERPOOL command captures one hour of activity.

Consider issuing pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, in the form shown above, at two different times of day -- maybe once during a busy online transaction time, and once during a period of heavy batch activity.

The other form of the command you'll want to use is as follows:

-DISPLAY GROUPBUFFERPOOL(*) TYPE(MCONN) MDETAIL(INTERVAL)

Since this command provides member-view information, you might want to issue it for more than one member of the data sharing group. As mentioned above for the GDETAIL form of the command, issue the MDETAIL form, on a given DB2 data sharing group member, twice, with an hour between each command issuance, so as to capture (in the output of the second issuance of the command) one hour of member-scope activity for each GBP (note that for the MDETAIL form of the command, the "STATISTICS SINCE timestamp-value" information is in a DSNB771I message, versus the DSNB782I message associated with the GDETAIL form of the command). Also as mentioned previously, consider issuing pairs of hour-separated commands at different times of the day (maybe heavy online and heavy batch).

If you or one of your colleagues has some REXX programming skills, you can do as some have done and have a program issue pairs of hour-separated -DISPLAY GROUPBUFFERPOOL commands, and then have a REXX routine parse the output of the second of those commands and insert useful information (as described below) into a DB2 table or maybe a file. If such a program and associated REXX routine are executed on a daily basis, you will have information that you can use to track, and adjust to, trends in GBP usage.

How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(GCONN) GDETAIL(INTERVAL)

Look for the "double zeros" -- referring here to zero write failures due to lack of storage (see the DSNB786I message text that's part of the command output), and zero cross-invalidations due to directory entry reclaims (part of the DSNB788I message text in the command output), for each GBP. If GBP write failures occur due to lack or storage, more than likely the GBP is too small (or the ratio of directory entries to data entries for the GBP is too large -- more on this below). You don't want GBP write failures, because that can land pages on the logical page list (LPL), and that can lead to failures for programs that subsequently attempt to access those pages prior to their being recovered from the LPL. [Note that a DB2 11 enhancement called group buffer pool write-around was introduced to help reduce the incidence of GBP write failures due to lack of storage. You can read about this enhancement in section 5.1 of the IBM "redbook" titled DB2 11 for z/OS Technical Overview (downloadable from http://www.redbooks.ibm.com/abstracts/sg248180.html?Open).] Cross-invalidations due to directory entry reclaims occur when a directory entry (used to track inter-DB2 interest in table space and index pages) has to be stolen (because all are in use and a new page has to be registered). Because the directory entry reclaim results in the GBP losing the "pointer" to a page, from a GBP-dependent data set, that is cached locally in a buffer pool of at least one group member, the page has to be preemptively invalidated wherever it is locally cached. You'd rather avoid that situation, because when a page invalidated for this reason is next accessed by a DB2 member subsystem, it will have to be read back into memory from disk (or from the GBP, but a read "hit" in this case tends to be quite unlikely). The page read in from disk will probably look just like the previously cached page that was marked invalid (because the invalidation was necessitated by a directory entry reclaim, not by a change of the page by a program on another group member); thus, a disk read that would be otherwise unnecessary is driven by a directory entry reclaim. The key to avoiding directory entry reclaims is to have a number of directory entries in a GBP that is at least as large as the number of data entries in the GBP plus the number of buffers in the corresponding local buffer pools in each member DB2 subsystem (as described in further detail in an entry I posted to this blog a couple of years ago).

Check out the specifications of each GBP. In particular, look at the ratio of directory entries to data entries. The default for this ratio is 5:1. If you see a directory-to-data-entry ratio that is significantly higher than 5:1, check to see if ALLOWAUTOALT(YES) is specified for the GBP in the CFRM policy for the Sysplex. If the directory-to-data-entry ratio for a GBP has been automatically increased by the system in order to avoid directory entry reclaims, that's OK, but in my experience this kind of adjustment can be an indication that the GBP is smaller than it should be (I wrote about this in a blog entry I posted a few months ago). Note that if you decide to make a GBP larger, make sure that all structures defined for both coupling facilities used by a DB2 data sharing group can fit in one coupling facility LPAR (except for secondary group buffer pools, when GBPs are -- as they should be -- duplexed), in case the other CF LPAR is down for maintenance purposes or due to a failure (if you temporarily had all structures in one CF LPAR, you would not during that time be duplexing the GBPs).

How I use information in the output of -DISPLAY GROUPBUFFERPOOL(*) TYPE(MCONN) MDETAIL(INTERVAL)

Basically, I use the output of this command to get one item of information that is not available in the output of the GDETAIL form of the command: the XI GBP hit ratio. That ratio (described in more detail in the blog entry referenced in the item above about directory-to-data-entry ratios) is so named because it has to do with synchronous GBP read requests that are driven by cross-invalidations (or XIs, in DB2 data sharing shorthand). Why does this ratio merit attention? Because if there are no cross-invalidations due to directory entry reclaims (see the first item above pertaining to the output of the GDETAIL form of the -DISPLAY GROUPBUFFERPOOL command) then what cross-invalidations did occur pretty much had to be caused by data-change activity (when a program connected to member DB2A changes a page cached locally in a buffer pool of member DB2B, the DB2B copy of the page has to be invalidated so that DB2B will know to retrieve the current version of the page). Data-change activity associated with GBP-dependent data sets results in changed pages being written to the appropriate GBP, and if pages were written to a GBP and the GBP has enough data entries to keep changed pages cached for a while, it is reasonable to expect that a good percentage of GBP reads driven by data-change-caused cross-invalidations will result in GBP read "hits." The XI read hit ratio is easily calculated using the SYNCHRONOUS READS DUE TO BUFFER INVALIDATION information in the DSNB773I message that is part of the -DISPLAY GROUPBUFFERPOOL MDETAIL output for each GBP: just divide the number in the DATA RETURNED field by the sum of the DATA RETURNED and DATA NOT RETURNED fields. I regularly see this ratio in excess of 80% for well-sized GBPs. If you see an XI read hit ratio that is substantially below 80% for one of your GBPs, consider making that GBP larger (if available CF LPAR memory permits).

There you have it. I hope that this information will be of use to you.

Sunday, November 29, 2015

DB2 for z/OS: How I Use -DISPLAY BUFFERPOOL Output

A couple of weeks ago, I was going over output of the DB2 for z/OS command -DISPLAY BUFFERPOOL with a group of people, and one of the meeting participants said words to this effect: "You've got some useful formulas and rules of thumb here. Could you write them down for us?" I did that, and it occurred to me that the information would be useful to others in the larger DB2 for z/OS community; so, I've packaged it in this blog entry.

First, enter the command the right way

Based on my experience, the most useful form of the -DISPLAY BUFFERPOOL command is as follows:

-DISPLAY BUFFERPOOL(ACTIVE) DETAIL

You actually want to enter the command twice, one hour apart. Why? Because the command output shows activity since the buffer pools were last allocated (which was probably when the target DB2 subsystem was last "bounced"), or since the command was last issued. If you issue the command once, then issue it again one hour later, the output of the second issuance of the command will show the preceding hour's worth of activity. That being the case, you can divide the activity counters by 3600 to get per-second figures, and those figures are very useful for performance monitoring and tuning work. [Note that the command output includes a DSNB409I message for each buffer pool, the text of which reads, "INCREMENTAL STATISTICS SINCE timestamp-value." Check that timestamp in the output of the second of the two issuances of the command, to verify that the command captured approximately one hour of activity.]

It might be a good idea to use -DISPLAY BUFFERPOOL to capture activity for a busy "online" hour (often a mid-morning or mid-afternoon hour) and for an hour during which a batch workload is particularly heavy -- one period might see more of a load on your buffer pool configuration than the other.

Something else to consider: some organizations have written programs that issue a pair of -DISPLAY BUFFERPOOL commands (again, separated by one hour) one or more times per day, and have a REXX routine that parses the output of the second issuance of a pair of commands and extracts useful information and inserts these values into a DB2 table or maybe a file. In that way, trends can be tracked and responded to with adjustments to the buffer pool configuration.

OK, with the output of the second issuance of a pair of one-hour-apart -DISPLAY BUFFERPOOL commands in hand, here's how you can put the information to good use:

Add up the size of all the buffer pools (in megabytes), to get the total size of the buffer pool configuration for the DB2 subsystem. That's easily done by multiplying the number of buffers allocated for a pool by the pool's page size; so, a pool with 20,000 buffers of 4KB apiece is sized at 80 MB. If you have a single production DB2 subsystem running in a DB2 for z/OS LPAR, I would look to move towards a situation in which the total size of the subsystem's buffer pool configuration is 30-40% of the size of the z/OS LPAR's memory resource. For example, if a z/OS LPAR with 100 GB of central storage holds one production DB2 subsystem, I'd aim for a buffer pool configuration size (that is, the aggregate size of all buffer pools allocated for the subsystem) that is in the range of 30-40 GB (if a z/OS LPAR holds more than one production DB2 subsystem, I'd want the total size of all the subsystems' buffer pool configurations to be not much more than 50% of the LPAR's real storage resource). If you're not in that 30-40% of memory "zone" at present, there's no need get there in one giant step from where you are. A series of steps (but pretty good-sized steps, not a lot of baby steps) would be fine. You should be most aggressive in growing the pools that have the highest total read I/O rates (see the next item). By the way, the aim here is to have a buffer pool configuration that makes really good use of an LPAR's memory for enhanced performance, while avoiding a situation in which LPAR memory is under too much pressure -- if the LPAR's demand paging rate, which can be obtained from an RMF (or equivalent z/OS monitor) CPU activity report, is in the low single digits or less per second, the memory resource is not under too much pressure.

For each pool, calculate the total read I/O rate. That's the sum of five numbers (for each pool) from the command output -- random synchronous reads, sequential synchronous reads, sequential prefetch reads, list prefetch reads, and dynamic prefetch reads -- divided by the number of seconds in the interval between the first and second issuance of the -DISPLAY BUFFERPOOL command (if that interval was, as suggested, an hour, the number of seconds would be 3600). Focus your attention first on the pools (if any) with read I/O rates in excess of 1000 per second, and see if you can get the I/O rate for those pools below 1000 per second by making the buffer pools larger. You can then turn your attention to pools with a read I/O rate in the hundreds per second, to see if the read I/O rate for those pools can be brought below 100 per second. If the read I/O rate for each of your buffer pools is below 100 per second, you are making exceedingly good use of LPAR memory.

The -DISPLAY BUFFERPOOL output shows the specifications for each buffer pool, and you should examine those. All high-I/O pools (total read I/O rate > 100 per second) should be defined with PGFIX(YES), and those page-fixed pools should be backed by 1 MB page frames (check to see that the value of the LFAREA parameter in the IEASYSxx member of the system's PARMLIB data set is large enough for this purpose). If your demand paging rate is very low (as noted previously, that would be low single digits per second or less), consider going with PGFIX(YES) even for low-I/O pools that have a lot of GETPAGE activity (more than 1000 GETPAGEs per second). Page-fixing these pools will deliver a CPU benefit if the pools are backed by 1 MB page frames.

Are any of the DB2 subsystem's pools used to "pin" objects in memory (i.e., to cache table spaces and/or indexes in memory in their entirety)? If yes, such pools should be defined with PGSTEAL(NONE), so that DB2 will know that they are intended to be "pinning" pools (the PGSTEAL(NONE) specification was introduced with DB2 10 for z/OS). Additionally, the target read I/O rate for a "pinning" pool is zero. If a "pinning" pool has a read I/O rate that is greater than zero, consider whether the pool needs to be enlarged so that all pages of all objects assigned to the pool can be kept in memory.

The counter for the number of times that the data manager threshold (DMTH) was reached should be zero for all pools. If that threshold is hit, either the buffer pool is way too small or the deferred write thresholds are way too high. [For a pool dedicated to 4KB-page or 32KB-page work file table spaces, the deferred write thresholds -- DWQT and VDWQT -- can be set to values higher than the respective defaults of 30 and 5, and doing this can result in some CPU savings, but don't take that idea too far. DWQT/VDWQT settings of 70/40 or even 80/50 should be OK for a work file-dedicated pool, but some sites have gone to 90 for DWQT for such pools, and have ended up hitting the data manager threshold as a result. Hitting DMTH causes the CPU cost of accessing data in a pool to jump, so make sure that you're staying under that threshold.]

The "prefetch disabled" counters -- no buffer, and no read engine -- should ideally be zero for all pools. If either is non-zero for a pool, either the pool is too small, or the VPSEQT threshold (default value is 80) is too low (so there are too few buffers available to hold pages read via prefetch, leading to elevated levels of prefetch read activity), or the deferred write queue threshold (DWQT) is too high (this would generally apply to a work file-dedicated pool, as mentioned in the preceding item).

For each pool, the number of asynchronous write operations should be significantly larger than the number of synchronous write operations. If this is not the case, the vertical deferred write queue threshold (VDWQT), or the horizontal deferred write queue threshold (DWQT), or both, should be lowered.

I hope that you find this information to be useful. Stay tuned for a companion entry in which I'll document the ways in which I use -DISPLAY GROUPBUFFERPOOL information (relevant to DB2 data sharing systems).