< All Topics
Print

Lessons Learned: CrowdStrike Incident

Lessons Learned: CrowdStrike Incident to all businesses, emphasizing the need for robust processes to maintain digital resilience and cybersecurity. CrowdStrike Holdings, Inc. is an American cybersecurity technology company based in Austin, Texas. The CrowdStrike Falcon Platform Software Update event underscores the importance of rigorous software testing, robust change management, and effective ITSM practices.

By adopting and maturing modern Digital Business Process such as automated testing, leveraging predictive intelligence, and implementing strong communication protocols, organizations can better anticipate and manage disruptions. Additionally, the incident highlights the necessity of comprehensive security operations and proactive incident responses to protect against exploitation by bad actors. Learn how these strategies can ensure business continuity and safeguard critical systems.

Overview Lessons Learned: CrowdStrike Incident

A software defect in CrowdStrike’s Falcon Sensor triggered a significant global IT outage, impacting multiple sectors. This incident underscores the importance of rigorous software testing, robust disaster recovery plans, and effective communication strategies.

Lessons Learned: CrowdStrike Incident

One thing we must learn from this, is this is not just a “CrowdStrike” outage, the largest IT Outage in history exposes the critical imperative for the IT Industry as a whole to fix vulnerabilities and take the lessons learned towards preventing future incidents.

IndustryEstimated ImpactPrimary Cause or VulnerabilityDigital Business Capability to Enhance
Stock Drop12% decline in CrowdStrike’s stockDefective software updateEnhanced Software Testing and Predictive Intelligence
Airlines$4.35 billion3,000 flight cancellations, 11,000 flight delays, and compensationsAutomated Incident Response and Disaster Recovery Plans
BankingOver $5 billionTransaction disruptions, customer service overload, regulatory finesComprehensive ITSM Practices and Security Operations
GovernmentOver $500 millionDisrupted emergency services, increased recovery effortsRobust Change Management and Communication Protocols
HealthcareOver $500 millionDelayed medical procedures, potential legal liabilitiesBusiness Continuity and Critical Situation Communication Skills

Continuously Improving Consumer Experience Capabilities

Enhanced Software Testing

Rigorous software testing ensures that defects are detected and corrected early, preventing large-scale disruptions. Automated Testing provides comprehensive coverage and speeds up defect detection. Moreover, incorporating multi-layered testing, including stress tests, QA, UAT, and sprint readiness checks, significantly enhances software reliability.

Enhanced Software Testing Statistics & Strategies:

  • According to Capers Jones, 85% of software defects are found during unit testing.
  • Implement automated testing tools like AutomatePro Autotest to streamline testing processes and enhance defect management.
  • Use continuous integration systems like Jenkins to ensure code changes are tested promptly.
  • Leverage AutomatePro AutoDocument for efficient Knowledge Article Management, test documentation, reducing manual effort and increasing accuracy.

Predictive Intelligence and Generative AI: Enabling Capabilities

Statistics and Strategies for Predictive Intelligence and Generative AI in Incident Management

Generative AI models can enhance incident detection accuracy by 25%, ensuring timely and effective responses. Predictive analytics can forecast up to 90% of IT incidents before they occur (McKinsey).

Barrista works with ServiceNow and excels at detecting incidents early and accurately, leveraging generative AI for performance. It is this proactive identification of potential issues that prevents incidents, maintaining system stability.

  • Improved Incident Detection: AI can reduce the time to identify security incidents by up to 12 minutes, a 60% improvement compared to traditional methods (IBM).
  • Why It Helps: Faster detection means quicker responses, reducing potential damage.
  • Enhanced Response Accuracy: Organizations using AI for incident response report a 50% reduction in incident impact (Capgemini). AI provides precise action plans, increasing the effectiveness of responses.
  • Efficiency Gains: AI-driven automation can handle 30% of incident management tasks, freeing up human agents (Gartner). Automating routine tasks allows human agents to focus on complex incidents, enhancing overall efficiency.
  • Predictive Insights: AI in incident management can reduce operational costs by 15-30% (Forrester). Lowering costs while improving incident response capabilities benefits the bottom line.

Security Operations

Strengthening security operations is essential as bad actors exploit known software errors. Enhancing monitoring, incident response plans, and employee training helps detect and mitigate phishing and hacking attempts promptly. Educating consumers on recognizing phishing attempts and securing their accounts with strong passwords and multi-factor authentication is vital.

Security Operations Statistics & Strategies:

Change Management Control

Effective change management controls reduce the risk of disruptions during software updates. Maintaining a detailed public change communication plan outlines planned upgrades, changes, and expected outages, ensuring stakeholders are informed. Conducting thorough implementation validation post-implementation confirms success or identifies rollback triggers early. Tracking incidents induced by changes and those resolved by changes fosters continuous improvement. Ensuring ServiceDesk integration allows teams to report early incidents promptly.

Change Management Control Statistics & Strategies:

  • Organizations with strong change management are six times (6x) more likely to achieve project objectives (Prosci).
  • Maintain a change calendar accessible to all stakeholders.
  • Use ITSM tools like ServiceNow or FreshService to track and manage changes.

ITSM Improvements

Improving IT service management ensures efficient incident response and resolution. Establish clear criteria for incident escalation during major incidents to guarantee effective communication and damage control. Enhanced protocols should clearly communicate actions, estimated recovery times, and available workarounds. Moreover, outage management should summarize and coordinate business impact communications and technical restoration efforts, maintaining detailed outage management records. Conducting post-implementation reviews of major incidents helps analyze timeline responses and lessons learned.

ITSM Statistics & Strategies:

  • 70% of high-performing IT organizations use ITIL-based processes (HDI).
  • Implement and continue to improve ITIL best practice processes to maintain incident and problem management capabilities.
  • Use ITSM software like ServiceNow, FreshService, BMC Remedy for tracking and managing IT services.

Critical Situation Communication Skills

Effective communication during critical situations builds trust and ensures stakeholders are informed. Developing and delivering clear messaging is essential. Addressing stakeholder concerns with empathy and reassurance about the resolution steps fosters trust. Maintaining transparency about the situation, progress, and expected timelines for resolution builds confidence. Encouraging two-way communication ensures stakeholder concerns are addressed effectively.

Critical Situation Statistics & Strategies:

Third-Party Risk Management (TPRM)

For strategic third-party vendors, risk management is of increasing importance, as this outage underscored. Regularly evaluating vendor assessments and their disaster recovery capabilities enhances readiness. Improving security and vulnerability response to prevent exploitation by hackers is crucial. Developing and testing manual operation procedures for critical system loss ensures operational continuity.

Difference Made by conducting regular vendor assessments:


Regular vendor assessments and improved security responses prevent exploitation by hackers. Developing manual operation procedures for system loss ensures continuity.

Vendor Risk Management Statistics & Strategies:

Digital Center of Excellence: Business Process, Digital Tranformation and AI.

Table of Contents