Archive for the ‘Uncategorized’ Category

Why Cloud Providers need Application Behavior Analysis

April 17, 2012

I have been reading a really interesting cloud blog by Huan Liu where he does uses various techniques to measure different aspects of public clouds (especially Amazon).

In his posting “Host server CPU utilization in Amazon EC2 cloud” He has found that Amazon utilizes only a percentage of CPU on their servers (his findings point at a 7.3% CPU utilization rate). As he points out, this is a lot lower than what most data centers achieve. The reason is that in order to try and solve the “noisy neighbors” problem they don’t over commit CPU or memory, which means that there is a tendency to reserve CPU for the worst case scenario for each instance hosted on the server.

On the other hand, many production applications have a general behavioral profile like the one he shows:

and it is clear that they need peak CPU for only a limited period everyday (and usage has a pattern).

So the dilemma is – over-commit resources and possibly hurt your customers, or under commit resources and make less profit. I believe that one answer to that dilemma is application behavior analysis.

An application behavior profile would benefit cloud providers in two ways. The first is that the algorithm that assigns virtual machines to physical machines could use a behavior profile to try and allocate anticorrelated applications to the same physical machine. The second is to use an application’s behavioral profile to enable it to “return” CPU when not needed, and use the behavioral profile to “lock in” CPU when needed.


Value, Ease-of-Use, Joy-of-Use

April 7, 2012

I work a lot with emerging enterprise software companies.  I have come to believe that every emerging enterprise software company needs to have a “free” version of their product (which is an anathema to most enterprise software companies – no matter what the size). My reasoning is that a free version of a product makes sure that the company knows what it takes to make a product that they can sell:

Value: Value doesn’t just mean that you have a product that solves a problem; it means that you have a product that solves somebody’s problem.  In other words there is a specific person that will benefit from using your software, and it provides them with enough value that they are willing to do what it takes to obtain and use your product. A free product makes sure you really understand who benefits from your product – because if people won’t use it for free, do you really believe that they will pay for it? A free version enables you to validate whether:

  1. You provide enough value relative to the effort involved in getting the product to work (see point 2).
  2.  You understand who really benefits from the product, and you are trying to convince the right people to use it.

It used to be that a proof-of-concept was enough to demonstrate value, but the consumer internet has changed people’s expectations.

Ease-of-Use: It may be that your product does provide real value to somebody, but the effort to achieve that value is just too great. If they need a services engagement to install and configure the product before they can derive any real benefit in their job – you are in trouble. It is OK to rely on services for a complete enterprise wide rollout, but it isn’t OK that no one benefits before that.  A free product ensures that you really know that someone is benefiting enough from your product (not to mention invaluable, direct product feedback).

Joy-of-Use: This is the nirvana of software. I don’t think that in an enterprise setting you can achieve Apple’s level of joy-of-use, i.e. where people play with their iPhone just because it is fun. For enterprise software I see this as an apropriate combination of 1 and 2, where a product provides enough direct value to someone’s work that they will spend effort needed to obtain and use your product. That is a good enough level of Joy-of-Use for “enterprise work”.

Value, Ease-of-Use, Joy-of-Use – It isn’t easy (and I have probably have heard almost every reason in the book about why it can’t\shouldn’t be done), but if you can’t figure out a free version of your product that delivers all three, you should be worried about whether the paid version of your product can actually make it.

Cloud Operations 2 – The Parietal Lobe

March 24, 2012

My previous post was about the frontal lobe of cloud operations – the monitor that notifies when  an application is not behaving correctly or as expected (i.e. an anomally is detected). This post is about what needs to be done when an accute anomally happens (usually meaning the either users or key resources will be effected by the problem) – and some real-time action needs to be taken to fix the problem.

In the fast paced world of cloud operations you essentially have one of two high level decisions to make – incrementally deploy more infrastructure to solve the problem or rollback to a previous version of the application. Either decision has impact on the business – rolling back means that your customers will lose functionality or features, and deploying more infrastructure means extra costs for the business.

There needs to be an additional “brain” that can both synthesize information from different systems, to make (and act on) the decision about rollback vs. additional resources. This is part of “SLA cost awareness” that I mentioned in my previous post – it needs to weigh the cost of rolback vs. extra infrastructure and also make some decisions about the efficacy of either course of action – whether to initiate a “flight” or “fight” response.

Once the response is decided, there needs to be a mechanism for implementing the decision. If the decision is “flight” (aka rollback) – there needs to be a well defined process that enables rollback in a timely, non-disruptive fashion. If the decision is “fight” (aka deploy additional resources) – there needs to be a way to define exactly what resources need to be applied, where they should be applied and how to apply them.

Actually this additional brain isn’t only for emergency situations. It needs to provide the same type of capability in any stressful situation –  whether caused by problem caused by an application anomally found by your APM, or because a new feature is being released. New feature release and upgrades are the mundane, but more frequent cause, of stress in the world of cloud applications and handling them well is the key success in cloud applications. More on that in my next post.

Cloud Operations – 1. The Frontal Lobe

March 17, 2012

In my previous post on cloud operations, the image has a sense\respond loop between the APM (Application Performance Monitor) and the rest of the system. This is the frontal lobe of cloud operations – its job is to analyze the information coming in from the APM and translate it into an appropriate action. This is one key area that still needs a lot of work (and invention) – but will be one key differentiator between cloud based applications and traditional applications.

The reason is the inherent elasticity of the cloud. You can always get more – more capacity, more storage – but it will cost you. If you have ever been part of an IT performance war-room then you know that capacity is magic elixir that fixes everything. The cloud makes that elixir so simple to obtain, it can get transformed into a panacea. Sure you can go and allocate another dozen web\app servers if the current systems aren’t keeping up with demand, but once you do that you’ll need to pay for the extra capacity. It becomes an immediate additional expense,  so just because you can do it doesn’t mean you should. These cost aware decisions will be a new role for operations, and will require a new type of SLA management – “cost aware SLA management”. Currently most SLAs focus on downtime (e.g. 99.9), and some focus on performance (x second response time) – but ignore the costs associated with maintaining the SLA. Once costs become more imediate and visible someone is going to have to manage them, and operations will be tagged with the job.

The problem is that APMs provide just too much information for humans to manage. There will need to be some sort of intelligent analysis distilling the information coming from the various APM systems and distilling the raw data into actionable information. I believe the only way to achieve that is through behavioral analysis of application and predictive analytics (I have been writing about this here). That is the only way to obtain the benefits that the cloud can provide, through intelligent systems that can make some decisions on their own (e.g. increase the number of servers to meet demand, within a predefined policy), and provide distilled, actionable information for operations when they can’t.

Cloud Operations – Preface

March 10, 2012

I have been spending a lot of time lately looking at the cloud from an operational perspective w.r.t. applications – I guess it would fall under the banner that some analysts would call DevOps and others would call AppOps. I see the difference between the two as either looking at applications from the perspective of everything that needs to be done before you can deploy an app, the other looks more at everything that needs to be done to deploy an app and monitor it afterwards. The line between the is really blurry – and as the cloud becomes more production oriented it will become even blurrier.

What I found is that actual enterprise production applications (not SaaS applications) are few and far between –  so not a lot of attention has been paid to the lifecycle issues of managing enterprise production applications in the cloud. Dev and QA are the kings of cloud usage in the enterprise at the moment. I also found that as opposed to the NIST definitions of cloud computing – IaaS (Infrastructure as a Service), PaaS (Platform as a Service) and SaaS (Software as a Service) which seem to describe a nice progression of functionality for the cloud – the real world is much messier. SaaS came first and most SaaS providers didn’t build their applications on IaaS or PaaS, they built their own homebrew “Private PaaS” tailored to their specific application using a mixture of bespoke and off-the-shelf tooling. I think that enterprise production applications will look very similar – just that they will use off-the-shelf IaaS for infrastructure provisioning, and off-the-shelf PaaS for specific components in thier application stack.

As was I was learning all this I think I finally understood why the cloud matters – way beyond its value as a cheaper delivery model, or a way to save on infrastructure costs. Cloud will enable IT to work like an agile production line from dev to delivery.  I use the term production line, but the cloud actually holds the promise of being able to provide much more than a physical production line –product lifecycles of days or hours, not months or years. I think as this picture becomes whole – it will drastically change the way we think about applications.

In my depiction below, I clumped together Infrastructure and Platforms, not because they aren’t important but because I wanted to focus on what most people are ignoring at the moment – what happens after the app is assumed to be ready for deployment. Using “classic” application delivery metaphors, that means understanding what happens after dev has finished and the app has moved into the realm of operations.

In my next few blogs I am going to spend more time describing this picture.

Will Virtualization Kill the Cloud?

December 5, 2011

I know there is a lot of contention over what exactly cloud computing means. Some people use the metaphor as “compute as a utility” (like electricity) – which seems to be more of a long term “grand challenge” to me.  I found a short description that I believe is right on the mark by James Urquhart – “Cloud computing is an application-centric operations model.” So for IT applications become king – and the whole of the IT will be focused around serving applications and application users. Even though sounds almost intuitive to most business folks (I mean what else is IT except a way to get applications to users?) it isn’t how most IT departments operate. In most IT departments – infrastructure is king, not applications. You want to deploy a new application or you need more resources for an existing application – well then wait a few months for the infrastructure folks to requisition, provision, integrate and provide you those resources. The cloud will play havoc with that model. There is a reason for this mismatch in paradigms. It is because for the infrastructure folks, cloud or not,  compute and storage resources are not flexible or infinite – even though they may appear that way to the application folks. Infrastructure is a physical resource – and therefore not unlimited.

It clear that virtualization is the mechanism that most cloud providers will use to try to create the illusion of “unlimited available resources” at the application layer. Virtualization isn’t new to the data center (mainframes have been doing it forever, and VMWare has been around for quite a few years now). What is new in the public cloud infrastructure – the organization using the cloud is completely blind to physical infrastructure and its topology – all you get to see is your VMs and the stack above those VM.

That blindness means that you don’t know if your apps are running on machine with 10 other apps, or if your VM has just been migrated to another physical server. Or maybe your cloud provider has skimped a bit on infrastructure, and now physical machines need to host a few more virtual machines to meet the needs of some peak period. In a perfect world that wouldn’t matter – but the world isn’t perfect. Your apps will be affected by their physical neighborhood – for example take the “noisy neighbors” problem that I mentioned in “Noisy Neighbors, Amazon Cloud and the Mainframe”. All of a sudden, through no fault of your own, your production applications may start acting erratically. So now you’ll need to understand that your performance and SLA problems may be caused by things that you can’t see, and can’t access – and will never be able to access.

Going back to the “compute as a utility” metaphor – virtualization makes the cloud is very different than say, an electric utility. You wouldn’t be very happy if your refrigerator was working 5 degrees warmer because the guy down the block turned on his air-conditioner – but that is what can happen when virtualization is used in the cloud. Just like with the refrigerator – you won’t know about the problem until it it is too late – the food spoils, irate users are on the phone.

So as the cloud matures – the issues brought about by mass virtualization will need to be addressed – or we may find that virtualization killed the cloud.

Not Unstructured, Not Unpredictable, Not Ad-hoc Processes – Simply Knowledge Processes

November 12, 2011

There is an interesting converstaion going on at the Adaptive Case Management group in LinkedIn. It made me notice that people continually struggle with how to describe the kinds of processes covered by adaptive case management – people use the terms unstructured, unpredictable and ad-hoc interchangeably (I too am guilty of that). The problem is that all those terms either mean something is missing or wrong with the process, and insinuate that if only we would work harder or smarter we could change any process to a more structured, predictable, well-defined, recurring process – which just isn’t true for knowledge processes. Do we as a community really believe that knowledge work will be mostly defined by predefined structured process? I don’t think so. I think we should call these types of process “knowledge processes” (which puts them in a positive light) rather than unstructured (or any of the other related terms) processes.

The key issue in morphing technology support from structured processes to knowledge processes (interesting how the words structured vs. knowledge sound when said aloud -try it) is changing the mindset from one of control to one of visibility, guidance and tracking. Knowledge processes demand inversion of control – it isn’t the process (and its model) which controls the participants, but rather the participants which control process and its flow. Optimizing a knowledge process isn’t about counting steps – but rather optimizing outcomes and appropriately leveraging skills. Managing a knowledge process isn’t about control, it is about providing guidance about possible next steps, ensuring appropriate levels of visibility into process execution, collaboration between process participants and tracking process execution through its steps.

Both the vendor and analyst ACM communities are mistakenly worried about exactly what technical features need to be included or excluded in an ACM tool. No matter what we all think about our own aproaches – there is no single right answer. I think that is main problem of ACM  – both the vendor and analyst communities immediately drill down to the technical features of the tooling – losing the bigger business picture.

Preventing Failure vs Fixing Failure

October 29, 2011

There has been a lot of discussion on the value of failure in the ACM\BPM community in last few weeks (Failure is Essential to Knowledge WorkThe Value of Failure, Preventable Faillure, Unavoidable Failure, Intelligent Failure).  Of course failure is part of any process (i.e. for some reason the process didn’t achieve a desired result) , though sometimes we use the word exception in the context of a process.

One of the key reasons companies deploy a BPM suite is to prevent failure. This is a major selling point for many BPM solutions. A key goal of a BPM suite is to enable the deployment of process driven solutions that prevent a deployed process from failing. As everyone knows – preventing failure is a lot cheaper than fixing it, so any technology that can help prevent failure is valuable. But is that really true in every context?

When deploying a BPM solution the question you need to ask is whether your processes are well defined enough and correct enough that you can really focus on preventing failure (and don’t forget in most cases  preventing failure is equivalent to limiting options). If that isn’t true (which is the case more often then people think) – then you should focus on the ability to fix failure fast, and correctly. Or said another way – the focus needs to be on “first fault failure resolution” capability rather than failure prevention capability.

I found this document from Netflix which I think is very interesting given the public failures they have gone through lately.  It has a lot of insight from a rapidly growing company about how they think about process (starting on slide 44), though the derogatory term bureaucracy is used. Here are some interesting process related quotes from the presentation:

  • Process brings seductively strong  near term outcome
  • “Good” process helps talented people get more done
  • “Bad” process tries to prevent recoverable mistakes
  • Embrace Context – Not Control
  • In a creative-inventive market, not a safety-critical market like medicine or nuclear power. You may have heard preventing error is cheaper than fixing it — Yes, in manufacturing or medicine…but not so in creative environments.

So in summary –  where you want to play it safe deploy a process solution focused on managing structured processes, if you need agility (and are willing to accept its associated risk) then you should focus on “first fault problem resolution” for your unstructured processes – rather than try to structure them to prevent failure.

On the CIO – CEO Gap

October 25, 2011

I read an interesting article on the gap between CIO and CEO technology priorities. Looking at the Gartner 2012 technology priorities (and using that as a proxy as what CIOs think is important) the CIO’s list revolves around Tablets, Mobile, Social, BI and Cloud. On the other hand the CEOs list is:  ERP, CRM, Specific business-line applications, E-commerce expansion, General IT modernization,  IT infrastructure improvements, Business mobility as it relates to major platforms, Business intelligence, Supply chain management and Security – interesting that process doesn’t show on either list, but I’ll leave for later.

What can I tell you?  CEO’s are right – they are essentially saying that a lot of previous technology initiatives aren’t finished yet, and they still need to maximize their value to the business. CEOs are also saying that technology isn’t interesting unless it serves a business purpose. The CEO’s list focuses on technology initiatives that actually affect the business or as I like to think about it can either optimize and streamline current business delivery (increase the bottom line) or generate new business (increase the top line). Most of the CEO’s focus is on ITs ability to increase the bottom line, not the top, which is the gist of how CEOs view CIOs – as the owner and manager of the infrastructure used to make the business machinery run – not as a partner on the business side, and certainly not as a seat of business innovation. Many CIO’s balk at being pigeonholed in this way, they would like to consider themselves full fledged partners on the business side and maybe even as a candidate for CEO.

I think that the technology lists really highlight the difference between a CEO and CIO. The CEO focuses on things like ERP which is “technology in the large” – a number of technologies applied as a large business transforming initiative. The CIO list is “technology in the small” specific technologies that have unclear impact on the business as a whole.

CIOs need to think of technology in the large – and they actually have the technical ability to translate it into the different small technologies needed to make the initiative work. CIOs need to view technology as a tool for business, not an end in itself. They shouldn’t focus on “cloud computing” but rather the business benefits of “agility” – and then break that down into what is really needed by to enable agility (and maybe the cloud fits).

Here is my list of possible CIO technology initiatives, each has a set of related technologies – but I am sure there are many others I have left out (and hopefully others will point them out to me):

1. Process management – since IT touches almost every business process, the CIO is really the only person that could understand how the business actually runs.  By taking a process oriented view of how IT supports the business a CIO could create real value for the company.  There are quite a few process management related technologies that could be relevant, and process management should certainly be on the short list of initiatives  for any CIO that wants to be CEO. I am surprised it didn’t show up on the technology list at all – though it did on the CEO’s list through ERP, CRM, applications and supply chain management.

2. Running IT as a Business – I am always surprised about how IT is actually run – IT could be a showcase of how information technology can be used to manage business, but it usually is not (and quite often the opposite). Eat your own dogfood – need I say more?

3. Leveraging Data for Business Value – the good news is this is actually on both lists. The bad news is the IT itself doesn’t do such a good job of leveraging their own data for their own needs. Again – shouldn’t IT be a showcase of how data can help make a business run better? Shouldn’t they be doing it themselves for their own business?

4. Application agility – applications are what business cares about – not how applications are delivered. The good news is that some of the technologies on the CIO’s list could be used to target appplication agility – cloud, app orientation, mobile – but if they are viewed as separate, distinct activities they probably won’t end up having a profound impact on the business. The point here is that the technology doesn’t matter from a business perspective – but applications do. How many innovative applications that help the business has IT come up with lately?

Are BPM suites another 4GL (fourth generation programming language)

October 22, 2011

Lately I have been spending more time with various IT issues (especially around application performance management) than with business process management (BPM) or adaptive case management (ACM). What I think surprised me most is how little BPM is used by most development and IT shops for their own use. Even when a truly structured IT process is being implemented (like deploying an application into production) – the tools are never based on BPM suites. The only time I see BPM suites being used is when there is an IT management decision to take a more process oriented approach to certain business applications. In that case a BPM team is created and they implement certain processes – it never seems to leak much into the broader IT domain. It also seems to be used mostly in the context of business application built by enterprise IT departments.

So where does that leave BPM suites? – they aren’t used by most developers, and aren’t used by business people. So in many ways it resembles a 4GL (a fourth generation languag,e for anyone old enough to remember) for implementing structured business processes that don’t have a packaged application available. Just like 4GLs in their day it provides value for a nice sized niche – but still a niche.

What does that mean for the future of BPM suites?

BPM suites could try to branch out and try to encompass unstructured processes – but that would mean essentially building a different tool as Max Pucher continually points out in his blog and Neil Ward-Dutton shows in his presentations on ACM vs BPM.  This can work in a suite context (as Keith Swenson points out in his blog) and I believe that this is where many BPM suites are headed – but I doubt if they’ll make it. It is a lot more complex then just adding some “social features” to BPMS. Most vendors don’t understand that, and neither do many analysts – just take a look at how much focus Forrestor puts on the structured part of case management (and how little on the unstructured part) –  in their dynamic case management wave. As I have said before we’ll know BPM vendors have nailed it when we see knowledge workers using their BPM applications on a daily basis instead of email.

A second direction would be to provide more value to participants in the niche – or expand to the folks who can benefit from data derived from the niche (like BI capabilites). I think all BPM suites will expand in this direction – but I question whether that will be enough to keep them from going the same path as 4GL languages – essentially a niche business that never breaks out into the mainstream -either from a IT or business direction.

A third direction would be for BPM suites to embrace their 4GLness and focus providing more value to developers so that BPM suites could be used by a broader set of developers, while evolving into general purpose business application developments suites – but I don’t see any BPM vendors going in that direction.