Cloud Operations 2 – The Parietal Lobe

My previous post was about the frontal lobe of cloud operations – the monitor that notifies when  an application is not behaving correctly or as expected (i.e. an anomally is detected). This post is about what needs to be done when an accute anomally happens (usually meaning the either users or key resources will be effected by the problem) – and some real-time action needs to be taken to fix the problem.

In the fast paced world of cloud operations you essentially have one of two high level decisions to make – incrementally deploy more infrastructure to solve the problem or rollback to a previous version of the application. Either decision has impact on the business – rolling back means that your customers will lose functionality or features, and deploying more infrastructure means extra costs for the business.

There needs to be an additional “brain” that can both synthesize information from different systems, to make (and act on) the decision about rollback vs. additional resources. This is part of “SLA cost awareness” that I mentioned in my previous post – it needs to weigh the cost of rolback vs. extra infrastructure and also make some decisions about the efficacy of either course of action – whether to initiate a “flight” or “fight” response.

Once the response is decided, there needs to be a mechanism for implementing the decision. If the decision is “flight” (aka rollback) – there needs to be a well defined process that enables rollback in a timely, non-disruptive fashion. If the decision is “fight” (aka deploy additional resources) – there needs to be a way to define exactly what resources need to be applied, where they should be applied and how to apply them.

Actually this additional brain isn’t only for emergency situations. It needs to provide the same type of capability in any stressful situation –  whether caused by problem caused by an application anomally found by your APM, or because a new feature is being released. New feature release and upgrades are the mundane, but more frequent cause, of stress in the world of cloud applications and handling them well is the key success in cloud applications. More on that in my next post.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: