Fighting Murphy with Deep Automation™

Introduction

As I recall it, it was somewhere around November of 1994 when I began my journey in the SAP Basis arena, sitting in front of a newly deployed R/3 system running version 2.1E. I was there because of my experience supporting the Enterprise Finance and Treasury department within the organization where I worked, and also because I “knew Oracle”. After a couple weeks of fruitless searching for database diagrams and the SQL> prompt, I buckled down, started learning, adapted, and evolved.

As it turned out, my timing was good. Opportunities to learn and grow were plentiful, I met many astoundingly gifted people, and bore witness to the evolution of what I continue to believe is an extraordinary triumph of precision engineering: SAP Enterprise Resource Planning Software.

Sitting here in 2021 as Chief Delivery Officer at apiphani, I see the wheels of change are turning once again. SAP clients worldwide, along with the whole of the SAP partner ecosystem, are considering what it means to RISE with SAP.

Billed as “an offering that brings together everything you need to transform your business in the way that works best for you” (SAP SE, 2021), it encompasses business transformation, delivered on the Cloud in a SaaS model. This move on SAP’s part to “to take direct responsibility for customer success and offer a streamlined contracting process” (Greenbaum, 2021) is laudable in many ways.

There has been commentary in the press that this offering sounds a death knell for traditional managed basis services.  No one who knows me will be surprised to learn that I disagree.  We founded apiphani in 2018 to take a new approach to mission critical, and this approach is uniquely positioned to add value across the range of SAP offerings, from traditional legacy to hybrid cloud to public cloud to Rise with SAP.  Once again, it’s time to adapt, and evolve.

Focus on outcomes

Human beings naturally respond to complexity by breaking problems down into manageable parts a la Henry Ford and the assembly line, and there is nothing wrong with that.  However, you can miss the forest for the trees if you get lost parceling out tasks at the level of the lowest common denominator.  By focusing on the business outcome desired by the client, you are more inclined to consider interrelationships and teamwork.  In a technically complex environment, the actions of one party ripple and impact the actions of another, and these currents impact the outcome.

Own the Stack

For this reason, it’s necessary to own the stack. Does your client need to run a resource intensive forecasting operation multiple times during a closing cycle? First of all, don’t make the argument that they really don’t need to do that! Secondly, consider each of the factors at the infrastructure, operating system, database, and basis levels that contribute to the ability of your client to achieve that outcome, because you are ultimately responsible for engineering all of these factors into a successful solution. And if problems do arise, we stay engaged with our clients until the issue is resolved as a matter of course at apiphani, we don’t argue about functional vs technical, or network vs server. Because we own the stack.

Don’t Silo by Technology

You will never own the stack if you silo by technology.  Bouncing tickets back and forth between, for example, platform teams and system administration teams is the surest way to a Severity 1 outage. Every exchange is another 10 minutes closer to a file system filling up and hanging the application.  These are unforced errors that can and must be avoided, which is why at apiphani we silo by account not by technology.  This means team members need to have a certain degree of fluency across domains, but the results are well worth it, and if you have the right technology fronting that team, the effort is greatly reduced. It also means each engineer is familiar with, and responsible to, the client.

Cultivate and Expand Domain Expertise

Invest in training and certification, both at the organization and individual level.  There is no substitute for domain expertise, nor is there a substitute for good judgement in high-risk situations.  Human error accounts for at least 70% of unplanned outages, according to the Uptime Institute (Heslin, 2019).  Moving applications to the cloud, or even adopting an automation strategy, will not alleviate the impact of service disruptions caused by configuration errors or insufficient process discipline. Apiphani has invested in the development and maintenance of ITIL v4 processes and controls along with its Deep Automation and incident avoidance technologies precisely so that we can accomplish more with fewer, highly experienced engineers. These professionals are then free to engage our clients in higher value work, leaving the low complexity, high risk tasks to algorithms and automation.

Focus on Consistency

As the SAP partner ecosystem evolved through the early 2000s, many larger providers began to adopt a labor arbitrage model to keep pricing competitive.  Large teams with a high percentage of level 1 technicians became the norm.  Repetitive tasks were “shifted left” to these less experienced and less involved resources.  In this environment, churn and inexperience fought against consistency.  Inconsistency caused unforced errors, and inexperience greatly increased mean time to resolution (MTTR) across the industry.  In the past several years, we’ve seen automation and predictive technologies really come into their own, giving providers a way to better manage outcomes in a consistent way. Unfortunately, we have seen all too often that these promising technologies are poorly implemented, or simply placed on top of dated processes that were never meant to support these innovative solutions. The outcome of this approach is as predictable as it is disappointing.

Automate Everything

When we started apiphani, there were several challenges we were looking to address. One of these was an over representation of human error in the management of the technology stack. Another was high employee churn. As it turns out, at least in part, these are related problems. High quality application engineers possess a great amount of ownership and take the health of their client’s environment very seriously. When those individuals are not enabled with the right processes and tools, they become firefighters instead of engineers. They spend a large amount of their time chasing errors and trying to make sense of alert storms emanating from a poorly configured monitoring product, all in an attempt to keep ahead of the next outage. The stress can be overwhelming and, eventually, they simply have no choice but to head for the exit.

Apiphani uses our proprietary Deep Automation™ technology to greatly reduce human error, but also to free our engineers to focus on more meaningful work. No longer firefighters, they are able to leverage their extensive knowledge and grow professionally in a supportive environment. This in turn provides additional value to our clients by increasing the consultative hours spent developing solutions for their business. Solutions which would normally be overlooked in a typical managed services engagement.

Nothing is sacrificed by adopting this approach. Our automations interact with our ITSM systems to provide all of the governance and auditability of a more traditional model while handling such tasks as full stack patching, storage management, failed backup reprocessing, system restarts, autoscaling, process management and many other client specific self-healing functions.

Correlate and Predict

Monitoring and alerting used to mean installing agents on servers, and then setting dozens to hundreds of threshold-based alerts per server that would alarm if they were breached. These alarms were sent to a Network Operations Center (NOC) for triage, and from there to an on-call group of engineers for action.  System “noise” often covered up real and/or developing issues in these cases.

In our experience, making the investment in a product that first establishes a baseline for what is normal, and then correlates and informs an engineering team of departures from baseline, is game changing in terms of consistency.  These products are integrated with our self-healing automations at apiphani. Our workflows take it from there, making sure the right individuals are notified, tickets are opened, updated, and closed as needed, and that the proper change control process is followed.

There is simply no such thing as a good incident.  Avoidance is key. This is why we have developed incident avoidance models that ingest large amounts of data from our monitoring tools, configuration management database, and system/application logs, in order to predict problems and prevent incidents before they happen.

Deal with Murphy

Muphy’s law tells us that “Anything that can go wrong will go wrong”.

During a recent maintenance cycle for a client, an operating system support pack application failed, and the error message indicated corruption that would require rolling back the patch and restoring the operating system.  Separately, a HANA database redirected restore failed and the error message indicated corruption with the system database, which would have required a full restore and recovery.  Some maintenance windows just go that way. In both cases however, the error was remediated without data loss and without blowing the approved window.

In other circumstances and with other tools, either of these serious errors would have initiated a long and drawn-out incident management process, with escalations to management at both the client and the provider.  The remediation effort would easily have required a team of four to six engineers.  Using apiphani’s model, each issue was resolved by a single technician within the prescribed window.  The technicians had good data to rely on, and good judgement borne of experience.  They could then, after a short interaction with the vendor, diagnose the root cause of the error and remediate it without restoring an OS or recovering a database.  The symptoms, causes, and solutions for each case were tracked within the ITSM system and the solutions added to the knowledgebase for future reference.

Move Forward at Scale

In a recent article on RISE with SAP, one of the partners was quoted as below:

 “A discrete role has been carved out for partners to help enable the solution configuration and implementation, while at the same time SAP is taking full control of the system maintenance — traditionally the infrastructure and Basis support — that some partners have made a living on,” he said. “The partners who’ve built up a business driven primarily on managed Basis support will have some attrition from this, though many of them are no doubt adaptable” (O’Donnell, 2021).

I believe this statement is both accurate and incomplete.  Certainly, at apiphani we feel ourselves to be adaptable.  Managed basis support is evolving into something more encompassing and climbs further up the stack than has previously been the case.  However, I think it’s misguided to dismiss “infrastructure and basis support” as a commodity that no longer requires attention.  It’s not merely patching and informing clients of issues that occur.  You can’t “throw things over the wall” and expect a good result.  Despite its impressive evolution, SAP is not a serverless application. Traditional basis elements continue to impact the experience of the business users of the system and must be managed in this context. The good news here is that with the right support model enabled by the right technology and the appropriate level of expertise, it is possible to achieve a finely tuned and exceptionally resilient environment at scale and at a competitive price point.

Conclusion

At the end of the day, business applications with any amount of custom configuration or third party product integration will require human oversight to ensure that they run optimally, at least for the foreseeable future. Where these workloads are located makes little difference. What has always mattered is how the application is managed. With the right combination of talent and technology, the sky’s the limit, and we hope that this will be a hallmark feature of RISE with SAP. Take this critical aspect of application support for granted however, and it may well turn out to be an Icarian journey.

References

Greenbaum, J. (2021, February 22). SAP RISE: The Good, the Missing, and the GSIs. Retrieved from eaconsult.com: https://www.eaconsult.com/2021/02/22/sap-rise-the-good-the-missing-and-the-gsis/

O’Donnell, J. (2021, March 8). SAP partners sound off on Rise with SAP program. Retrieved from techtarget.com: https://searchsap.techtarget.com/feature/SAP-partners-sound-off-on-Rise-with-SAP-program/

Heslin, K. (2019, September 23). How to avoid outages: Try harder! Retrieved from uptimeinstitute.com: https://journal.uptimeinstitute.com/how-to-avoid-outages-try-harder/

SAP SE. (2021). RISE with SAP. Retrieved from sap.com: https://www.sap.com/products/rise.html/

 

Cynthia Borgman
Get in touch with our experts and get a free consultation

Related Posts:

Integrating Applications with RISE with SAP Integrating non-SAP solutions into an SAP
In my last blog post, I discussed observability and how apiphani uses
In a recent blog post, I discussed the ever-increasing importance of observability
  More and more, businesses are seeking solutions that offer agility,
In 2021, I mused on what it might mean to RISE with
Introduction to SAP S/4 Hana Since its implementation on a massive scale