OPS, AI DID IT AGAIN
AI renders IT operations fluid, proactive, and resilient, improving efficiency and reliability while it learns – on its way to full, handsfree autonomy
So many platforms, applications, services, and edge devices to securely take care of. And all of that in an increasingly hybrid, multi-cloud context. Enough to lose your senses. It’s the perfect playground for AI to take charge of the complexity. AI recognizes patterns, generates insight, and detects disturbances in real time. Then it looks through even the opaquest of systems, predicting what will happen to allow for timely measures, and suggesting what should be done. And all the while it learns, becoming more and more autonomous in running its IT operations. Oops, is that infrastructure taking care of itself?
Indu Malhotra Expert in Residence
WHAT
- AI for IT Operations (“AIOps”) collects and analyzes data, from sources such as system log files, incident tickets, network traffic and sensory data – all in real time – to continuously improve observability, security, performance, and resilience.
- AIOps can replace traditional monitoring tools, driving a cross-domain cohort of observability across complex landscapes with microservices, applications, containers, servers, and multiple platform services hosted in hybrid, multi-cloud environments.
- Integrating AIOps with DevOps, quality assurance, and Site Reliability Engineering (SRE) not only reduces complexity, but also drives high-frequency, high-quality, and cost-effective platform delivery across applications and infrastructure.
- Previously focused on scripting and automating IT-driven business process, Robotic Process Automation (RPA) now also enables more effective IT operations, increasing speed, agility, and cost effectiveness. The next step: (semi-)autonomous operations.
- Similarly, automated incident intervention and sentiment analysis – using analytics and Natural Language Processing (NLP) expands from customer scenarios to frictionless IT service desk engagements with (remotely working) business users.
USE
- A US media giant leverages AIOps, moving from “alert fatigue to actionable operational insights” across its interdependent media services, hosted on public and private cloud, bringing a 99% “alert” noise reduction to IT operations.
- A US State applies AIOps to provide its citizens with reliable access to the state’s unemployment insurance portal, bringing real-time visibility of the key service, resulting in significant reduction of IT issues and performance degradation.
- A large US-based animation production firm optimized production for its creative teams during the pandemic, allowing digital production of petabytes through a digital data pipeline, using AIOps to predict and proactively address IT operational issues.
- A Nordics-based automotive manufacturer extended its AIOps capability with analytics to address the verification challenges of autonomous driving on highways and confined areas, such as mines and quarries, exposing bugs and edge cases.
IMPACT
- Routine, repeatable IT operational tasks can be automated to provide a frictionless service while reducing costs and a focus on more strategic, value-adding activities.
- Real-time handling of events in a converged IT operations and cyber-threat prevention framework, ensuring business resilience, continuity, and stability.
- A rapid diagnosis and resolution of IT operations issues ultimately leads to higher customer and employee satisfaction and retention.
- Dealing with the scarcity of skilled SRE and DevOps resources, AIOps can reduce the quantity of expert resource required to run critical services.
- Adoption of AIOps drives IT Operations from predictive, to prescriptive and even autonomous ways of working, where systems can not only self-analyze, but self-heal.
- Extending beyond the realms of IT, AIOps can predict customer behavior, proactively and seamlessly fixing cyber threats, contributing to business resilience and growth.
TECH
- Observability: AppDynamics, Splunk Enterprise, Datadog APM, Sumo Logic, Dyntrace, TrueSight Operations Management, New Relic One, BigPanda, Helix Platform, DX Operational Intelligence, StackState,
- AIOps: MoogSoft, Splunk Cloud, Aisera, ScienceLogic, IBM Cloud Pak for Watson AIOps, BigPanda, Sumo Logic, Helix Platform
- Chaos: ChaosIQ.io, Steadybit, VMWare Mangle
- SRE & Application Operations: PagerDuty, ServiceNow, FireHydrant, Honeycomb.io, Splunk On-Call, Buoyant.io