The global cyber crash on Friday, July 19, has been declared the most significant IT outage in history, a claim supported by industry experts. This unprecedented disruption was likely the result of a perfect storm: an urgent update that bypassed extensive testing or an issue undetectable within the testing environment, leading to a problem that standard mitigations couldn't resolve. Whereas some faulty updates are rectifiable with subsequent updates or rollbacks, this escalated rapidly.
A CrowdStrike Black Friday-type event likely wasn't on the authors' minds of the recent IOSCO Report (FR04/2024) on market outages. Surveying IOSCO members, the report identifies 42 market outages from 2018 to 2022, primarily due to software and hardware issues. It emphasises the necessity for clear communication, robust outage plans, and effective business continuity measures to minimise the impact on market participants and the financial system.
Sinara, a London-based software house building and supporting systems for financial markets participants, fielded calls from concerned clients wanting to know if they were impacted by the July 19 event. “We were fortunate to be unaffected and in a position to assist our customers if they needed support,” says Steve Dobb (below), technical director at Sinara. “Though it demonstrated the growing risks and the need to avoid similar disasters. We haven’t seen anything at this scale hit the financial markets before, but it doesn’t mean they are invulnerable.”
IOSCO advised stock exchanges and trading platforms to develop and publicise plans for handling outages to offer greater predictability for customers. Although trading systems of exchanges remained unaffected on Friday, the broader financial markets ecosystem, including banks and tech groups, experienced significant disruptions, with some traders unable to access their systems to process trades.
The IOSCO report highlights that exchanges worldwide have suffered glitches and outages for various reasons, including software changes. However, none have matched the scale of the CrowdStrike event, with prior incidents being localised and lasting no more than a few hours.
An update to CrowdStrike's widely used "Falcon Sensor" software caused Microsoft Windows to crash, displaying the infamous "Blue Screen of Death." While software updates vary in urgency and all undergo some testing, they can still lead to unexpected issues. CrowdStrike swiftly identified the root cause and deployed a fix. Nevertheless, while a simple reboot restored functionality for some systems, others took longer to recover fully.
“Sometimes unforeseen issues can happen,” continues Dobb, “and it’s crucial that you have a process in place, and the right skills, to make sure you’re able to manage them, work with your technology partners, and communicate what’s going on with your customers.”
A recent New York Stock Exchange glitch led to significant volatility in Berkshire Hathaway and Barrick Gold shares, and trading halts in dozens of other companies before the issue was resolved.
The IOSCO report underscores the necessity for trading venues to establish and publicise outage plans, execute effective communication strategies, and assess the implications for closing auctions and prices. It also emphasises the importance of post-outage analysis to improve future responses.
IOSCO outlines five best practices for trading venues to enhance market resilience: developing outage plans, enforcing communication strategies, reopening trading efficiently, managing closing auctions, and conducting post-outage reviews. These practices are designed to be flexible and adaptable across various market structures and regulatory environments.
Although these best practices are primarily designed for equity listing trading venues, IOSCO acknowledges their relevance to other trading venues, including non-listing and derivatives markets. IOSCO encourages adopting these practices while recognising that the final decision lies with individual venues and is subject to domestic legal and regulatory requirements.
The report also examines the legislative and regulatory frameworks within IOSCO jurisdictions, noting that most trading venues require business continuity plans and must notify regulators of outages. It highlights industry protocols, such as those from the Federation of European Securities Exchanges (FESE), to standardise communication and outage procedures.
“These are all good practices,” Dobb confirms, “though it’s vital that they don’t just exist on paper but are really put into practice, with regular test runs and training refreshers for staff. It’s important people know what to do when problems happen.”
In conclusion, the IOSCO report provides a comprehensive analysis of market outages and offers practical recommendations to strengthen the resilience of financial markets. It advocates for a balanced approach that respects the unique circumstances of each market while promoting best practices for managing and recovering from trading disruptions.
Established, legacy players reliant on complex, ageing infrastructure dominate the global exchange landscape. Unsurprisingly, they struggle to keep pace with the increasingly demanding trading environment, whether due to higher volumes or evolving requirements. The ongoing cost pressures and scale of necessary technological overhauls make it challenging for exchanges to implement upgrades; shutting down an exchange for weeks to facilitate a complete transfer is not feasible.
“There are many ways that exchanges can introduce new and better technology, and it doesn’t need to be all at once,” says Dobb. “The same goes for exchange members, who also need to make sure they have multiple ways of accessing their exchanges, in case one method fails.”
Another significant barrier to change is the broader ecosystem in which trading systems operate, consisting of various organisations dependent on infrastructure compatible only with legacy technology. Consequently, counterparties may be unwilling or unable to accommodate a switch or significant upgrade by the exchange.
One outcome of CrowdStrike's Black Friday will be internal IT teams working in closer partnership with their software suppliers, whether for security or other operations, to better manage the software update process.
“It’s quite right that our customers want to make sure they don’t end up in the situation where an update that is supposed to make things better actually creates a problem,” concludes Dobb. “Certainly we at Sinara understand that even a small release needs to be treated with due care—otherwise we can all see the consequences.”