Wednesday, October 24, 2012

DB2 for z/OS: Have You Checked Your Log RBA Value Lately?

I'm in Las Vegas this week for the IBM Information on Demand 2012 conference. At this year's IOD event I've particularly enjoyed learning more about DB2 11 for z/OS, the beta program for which is expected to begin in March of 2013. DB2 11 will deliver a lot in the way of new features and capabilities. One of the big news items here is the extended RBA that will be available in DB2 11 systems running in new-function mode (the RBA, or relative byte address, is an identifier for records written by a DB2 subsystem to its transaction log). Whereas the current RBA is a six-byte hexadecimal value, providing 2^48 bytes (256 TB) of log capacity, the DB2 11 extended RBA is a ten-byte value, allowing for the logging of a yottabyte of data-change records. Not familiar with the term, "yottabyte?" Neither was I until a couple of days ago. It's 1000 zetabytes, or a trillion terabytes. That's a lot of logging capacity.

A larger maximum RBA value is important because a few DB2 for z/OS sites have bumped up against the 2^48-byte limit of the current RBA. Actually, what these folks bump up against is a practical RBA value limit that's a little less than 2^48. See, the RBA value would be thoretically maxed out at x'FFFFFFFFFFFF' (in hexadecimal notation); however, actually reaching that value would prevent a DB2 subsystem from doing much of anything, and that would not be good for data integrity protection. So, what happens is that DB2 will automatically shut down if the log RBA value reaches x'FFFF00000000'. This gives the DB2 subsystem a little room at the end of the log that it will use, when restarted, in resolving any incomplete units of recovery and otherwise ensuring that data is left in a consistent state (an attempt to restart the subsystem in "regular" mode after the automatic shutdown will result in DB2 going through "restart light" processing and shutting down again, after which it could be restarted again with ACCESS(MAINT) for any further clean-up tasks).

That sort of automatic shutdown is of course something you'd want to avoid, and fortunately you can. First of all, DB2 will issue message DSNJ032I, with an "alert level" of WARNING, when the log RBA value reaches x'F00000000000'. This would give you some time to plan for dealing with the approaching RBA limit (DSNJ032I is also issued, with an alert level of CRITICAL, when the RBA value reaches x'FFFF00000000' -- the value that triggers automatic shutdown). Here is a piece of advice for you: do NOT wait until your DB2 subsystem issues DSNJ032I with the WARNING alert level. Instead, check the RBA value for each of your DB2 for z/OS subsystems now. This can be quickly and easily done by issuing the command -DISPLAY LOG on each subsystem. In the command output you'll see a field, H/W RBA, that shows the highest RBA value written by the DB2 subsystem to its transaction log. Chances are, the value that you see in this field will be way lower than x'F00000000000', the WARNING threshold. A colleague of mine who has a lot of knowledge in this area, Paul Bartak, has said that a high-RBA value of x'C00000000000' or greater should be a trigger that gets you into RBA limit planning mode. If what you see for H/W RBA in -DISPLAY LOG output for a subsystem is less than x'C00000000000', just keep an eye on the situation through periodic issuance of -DISPLAY LOG.

If the high RBA value for a DB2 subsystem in your environment is north of x'C00000000000' then as my co-worker Paul suggested you should start developing a plan of action. What you'll end up doing will depend on whether you run DB2 for z/OS in standalone mode or in a data sharing configuration on a Parallel Sysplex. In the standalone case, you will follow a procedure that will result in the DB2 subsystem's RBA value being reset to zero. This procedure is documented in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA value in a non-data sharing environment" (you can access the DB2 for z/OS product documentation, including the Administration Guide, in either PDF or HTML form, at http://www-01.ibm.com/support/docview.wss?rs=64&uid=swg27011656). A number of mainframe DB2 sites have successfully gone through this documented RBA-reset process -- it works.

If you have a DB2 subsystem that is approaching the log RBA limit, and that subsystem is a member of a data sharing group, there are a couple of options available to you. One alternative is to reset the member subsystem's RBA -- a less involved process in a data sharing versus a non-data sharing environment. The other choice involves starting up a new member of the data sharing group and permanently quiescing the member with the elevated RBA value (with DB2 10 in new-function mode, one can delete a member from the data sharing group -- until then, the subsystem ID of the permanently quiesced member will continue to show up in the text of some DB2 messages, but that's generally not a big deal). The DB2 for z/OS Data Sharing Planning and Administration Guide, under the heading, "Removing members from the data sharing group," provides information on permanently quiescing a member subsystem. Note that if the new member -- the one that will replace the subsystem that is to be permanently quiesced -- will be the eighth to join the data sharing group, entries in the lock table (part of the lock structure in a coupling facility) will have to be 4 bytes apiece, versus 2 bytes each. To maintain the same number of lock table entries in that case (good for holding the line on false global lock contention), you'd want to increase the size of the lock structure -- which should already be at a power of 2, such as 64 MB -- to the next higher power of 2.

Whether your DB2 subsystem is operating in standalone or data sharing mode, there is information in the DB2 for z/OS Administration Guide, under the heading "Resetting the log RBA," on using output from the print log map utility (aka DSNJU004) to estimate the amount of time you have before a subsystem's RBA value reaches the critical threshold.

Something else for those of you running DB2 in a data sharing group: do NOT think that RBA values no longer matter, just because LRSN values (short for log record sequence number) are used for data recovery operations. RBA values continue to go up for data sharing members as they write records to their log data sets, and the automatic shutdown occurs for a data sharing member just as it does for a non-data sharing subsystem when the RBA value reaches x'FFFF00000000'. I don't want you to be caught napping in this regard.

Speaking of LRSN values, those are also extended beginning with DB2 11 in new-function mode, to the tune of giving you something like another 30,000 years of value growth (LRSN values are derived from the store clock timestamp, and the current LRSN field can take you to the year 2042). Given the 30 years of value-increase that can be accommodated by the current-format LRSN, will any organizations be in a hurry to implement the DB2 11 extended LRSN (done, as for the extended RBA, through a conversion of the bootstrap data set and online REORGs of database objects)? Some will be. Why? Because on occasion a company had to add a delta to the LRSN of the originating member of a DB2 data sharing group in order to have a starting LRSN value that was higher than the already-elevated RBA value of that originating DB2 member. You can look at print log map utility output to see of you have such an LRSN delta in your environment. If you don't -- and that's probably the case -- then the LRSN value limit is still a long ways off for you. If there is a delta, you'll be more interested in getting to the DB2 11 extended LRSN sooner rather than later.

So, to recap: you should periodically check the RBA values of your DB2 subsystems, to ensure that they are well below the critical threshold. If the RBA value for a subsystem is seen to be high (with a 'C' in the high-order position), DON'T PANIC. Prepare to head off the RBA limit situation before it occurs, and get ready to put all this in your rear-view mirror once you get to DB2 11 new-function mode.  

Sunday, October 14, 2012

So, Where do You Specify the IP Address of YOUR DB2 for z/OS Subsystem?

If your answer to the question posed in the title of this blog entry is, "In a PORT reservation statement in our z/OS LPAR's TCP/IP profile data set," that's NOT what I want to hear. It's true that for a long time this was the standard way to assign an IP address to a DB2 for z/OS subsystem, and it would look something like this:

PORT 446 TCP DBP1DIST BIND 111.222.333.444  ; DRDA SQL PORT FOR DBP1

This was the way to go because DB2 didn't give you much of a choice in this regard. You can also set things up so that a DB2 subsystem's IP address is specified in a domain name server, but that's not the answer I'm looking for, either. No, if I ask you where the IP address of your DB2 subsystem is specified, what I want to hear is this: "In the bootstrap data set, Mr. Catterall." Of course I'm kidding about the "Mr. Catterall" bit, but the BSDS is indeed where DB2's IP address should be. I'll tell you why momentarily.

Specifying the IP address of a DB2 subsystem in the associated BSDS is something that only recently (relatively speaking) became possible -- the capability was delivered with DB2 9 for z/OS. In a DB2 9 environment, an IP address can be added to a subsystem's communications record in the BSDS by way of the change log inventory utility, aka DSNJU003. The input statement for an execution of DSNJU003 that would place an IP address in the BSDS would be of this form:

DDF IPV4=111.222.333.444

Note that the address of the subsystem could also be specified in the BSDS in IPv6 form (colon hexadecimal). Note, too, that in a data sharing system you can place in the BSDS the IP address of the data sharing group as well as the address of the individual member subsystem. Also in a data data sharing environment it is highly recommended that the address of a DB2 subsystem be a dynamic virtual IP address, or DVIPA (that way, if a failed DB2 member is restarted on a different z/OS LPAR in the Parallel Sysplex, clients utilizing DRDA two-phase commit protocol will be able to re-connect to the member to resolve in-doubt threads).

If you put a DB2 subsystem's IP address in the associated BSDS, a PORT reservation statement for the subsystem in the LPAR's TCP/IP data set would look like this:

PORT 446 TCP DBP1DIST ; DRDA SQL PORT FOR DBP1

That's right: no IP address for the DB2 subsystem in the PORT statement. When a DB2 subsystem's IP address is recorded in that subsystem's BSDS, placing the address as well in a PORT statement for the DB2 subsystem is not only unnecessary, it's ill-advised. Why? Because if the IP address is recorded in both places (the BSDS and the PORT statement), the PORT statement wins. Why is that important? Because if z/OS Communications Server gets a DB2 subsystem's IP address from a PORT reservation statement, you miss out on the benefits of INADDR_ANY functionality. INADDR_ANY is a TCP/IP capability whereby, in a DB2 for z/OS context, a DB2 subsystem can accept requests to its port on any IP address. What that gives you is configuration and operational flexibility. This is particularly advantageous in a DB2 data sharing environment, and when DB2 Connect is running under Linux on System z and utilizing HiperSockets for access to DB2 for z/OS.

There are other goodies associated with recording a DB2 subsystem's IP address in the BSDS instead of in a PORT reservation statement: you don't have to define a domain name for the DB2 subsystem to TCP/IP, and you can use SSL (Secure Socket Layer) for DRDA communications (not possible with BIND specific TCP/IP statements).

You've probably done a lot over the years to modernize your DB2 for z/OS environment. Getting a DB2 subsystem's IP address into the BSDS and out of the PORT reservation statement is another such move. Take this action and position your DB2 subsystem for enhanced client-server application availability, agility, and functionality.