Business Transaction Monitoring (BTM) and the Mainframe

Today’s large datacenters are hybrids – they manage both distributed systems (e.g. Linux) and mainframe systems.  Many applications span both – perhaps starting as a transaction from a web customer, but then fetching relevant customer information from the mainframe. From a customer’s perspective it is a single seamless system, the problem is that technically (and organizationally) that just isn’t so. There is a clear divide between the distributed side and mainframe side of a data center – different tools, different personnel, and different cultures. Bridging that gap is one of the key problems facing end-to-end performance management today (and probably in the near future).

One new technology on the distributed computing performance management scene that is trying to (at least partially) bridge that gap is Business Transaction Monitoring (BTM). Some analysts claim it is an Application Performance Management (APM) technology; others view it as a standalone segment. The idea behind the technology is to trace, in real time, every single transaction from the user request, through the various application components and tiers, until a response is returned. By tracing every step of a transaction through all the components and tiers, BTM provide a detailed overview into transaction performance and how it relates to the components used by the transaction.  So if a certain set of transactions are not performing as expected, you can look and see all of the performance data related to components used by the transaction (e.g. how long a transaction took in a certain component) from beginning to end, and understand which tier and which components caused the problem. The main players in the market are niche players (e.g. dynaTrace – just acquired by Compuware, Optier, CorrelSense) – though of the larger vendors at least IBM claims to have the capability in their Tivoli ITCAM product suite.

Many of the BTM vendors claim that in today’s distributed environment (especially a virtualized environment) – transactions are the only constant, the thread that links the performance details of all the components used by a transaction, enabling end-to-end application performance. That certainly is true of distributed environments – but the argument breaks down under closer scrutiny in the mainframe.

The problem is that once a request is handed off from the distributed systems to the mainframe (e.g. through a transaction gateway) – there really is no efficient, effective way to track that specific transaction through the various mainframe components used to process the request. BTM treats the mainframe as a “black box” single component, and can only tell whether or not the mainframe is the culprit for performance issues as they relate to a transaction. That of course doesn’t help much with understanding or diagnosing the problem on the mainframe side. It certainly is of no use in telling you “where to look” on the mainframe.

 The reason that BTM has such a hard time with mainframes is because mainframe OS and transaction monitors (usually CICS) are architected for the highly virtualized mainframe environment and decisions about how to process the transaction are under system, not application, control. The transaction monitor and OS are making a lot of resource driven decisions on how to process a transaction on their own. It is not that the information for tracking a transaction end-to-end doesn’t exist (e.g. unit-of-work-id can be used to track the transaction throughout the mainframe) – it is that there is no realistic way to get that information for every single mainframe transaction in real time (or near real time) without unacceptable processing overhead (i.e. slowing the mainframe to a crawl). It can be done offline (using SMF records), and I even know of some customers that do that – but not in real time. The level of virtualization and complexity of a mainframe means that BTM for mainframes just isn’t a viable solution.

So in summary BTM is a great idea for distributed systems, but doesn’t help much in a true end-to-end scenario – since it isn’t relevant for the mainframe. However – linking BTM for distributed with behavioral analytics on the mainframe – that is something that could have profound impact on real world end-to-end performance management.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: