In June 2023, the US Department of State discovered Chinese cyber espionage activity relying on a fundamental vulnerability in Microsoft’s cloud technology that enabled hackers to forge identity authentication tokens. The vulnerability enabled the compromise of sensitive email (and other service) accounts, including that of Secretary of Commerce Gina Raimondo.
This incident illustrates some of the risks associated with cloud computing’s many benefits. While much of the discussion around cloud computing is centered around these benefits—this infrastructure bears consideration as well. Just like other critical infrastructure sectors—such as energy, water, financial services, the defense industrial base, and more—disruptions to major cloud services could have material effects on economic and national security. The cloud’s centrality to critical infrastructure is the basis of the Atlantic Council’s recent report, “Critical Infrastructure and the Cloud: Policy for Emerging Risk,” which seeks to raise awareness of the seriousness of potential cloud disruptions and increase efforts toward bolstering cloud security and resilience across critical infrastructure.
To examine these risks, we brought together a group to share their perspectives on the challenges facing cloud infrastructure and how policy can encourage better security and risk governance across this critical sector.
#1 Are the challenges facing cloud infrastructure security well-defined and understood by providers? What’s the biggest question you see as unresolved in cloud security?
Maia Hamin, associate director, Cyber Statecraft Initiative, Digital Forensic Research Lab (DFRLab), Atlantic Council:
“The hyperscale Infrastructure-as-a-Service providers—AWS, Microsoft Azure, Google Cloud—understand many questions about the security of the cloud; they have enough reason to. Then again, many hard problems remain hard—the recent Microsoft compromise is a reminder that identity and access management is crucial to the whole premise of cloud security, and something that even well-resourced providers get wrong. The biggest unresolved questions that I see are those of interdependence and systemic risk. Where are there particular widely used technologies inside of a single provider—like identity and access management—where a software vulnerability or error could lead to compromise or outages across users (and availability zones cannot save you)? Where are there widely used technologies across providers—widely deployed superscalar processors like those from Intel, for example—that might be found vulnerable en masse and create impacts across cloud providers? Big cloud service providers are not necessarily well set up to solve some of these risks since they bridge across companies and there are a lot of incentives toward business secrecy.”
Jim Higgins, chief information security officer, Snap Inc:
“I think the challenges are well known to the cloud service providers themselves, but not to the public. We could use a lot more transparency as to what the cloud service providers feel are the large security issues and see if they are aligned with the expertise of their own customers.”
Chris Hughes, chief information security officer and co-founder, Aquia:
“On one hand, I am inclined to say yes, because the three largest Infrastructure-as-a-Service providers by market share—AWS, Microsoft Azure, and Google Cloud—are the only cloud service providers that are operating at such scale and scope. That said, they face specific challenges as providers upon which the entire modern Internet and digital infrastructure has become dependent, ushering in unseen levels of systemic risk across the ecosystem. The biggest question I see as unresolved in cloud security is how cloud service providers and regulatory bodies should work together to address that systemic risk and ensure that critical dependencies do not have devastating downstream impacts on thousands of companies and millions of individuals in nearly every industry vertical, including critical infrastructure and economic and national security. How do we fix transparency gaps that impede our ability to fully understand and address these systemic risks, while not stifling innovative cloud services in the marketplace?”
Rich Mogull, analyst and chief executive officer, Securosis:
“First, we need to accept that there are material differences between cloud providers. At one end are the hyperscale providers—AWS, Microsoft Azure, and Google Cloud. Of those, I think the companies understand the security concerns but do not necessarily prioritize them to the same degree. The recent Microsoft issue is one example. Other providers are not even playing the same game—especially Software-as-a-Service providers. It is the Wild West, and only some providers understand the security challenges and take them seriously. There really are not unresolved questions, but providers must do the work and stay on top of things. Right now, my biggest area of concern is Microsoft’s Entra ID (formerly Azure Active Directory).”
Marc Rogers, chief technology officer, nbhd.ai:
“While I believe the Infrastructure-as-a-Service providers have a better handle on their challenges than their customers do, the gaps are large and lead to incidents that blindside defenders. The risk that concerns me most is visibility and transparency, especially for the consumers of Infrastructure-as-a-Service. Attackers are already several steps ahead on understanding chains of trust, cross system exposure, and the building blocks like open-source software.”
#2 If cloud service providers are struggling to engineer critical services to the level of reliability that current threats demand—as demonstrated in the latest Microsoft cloud compromise—what role could policy play to help address this gap?
Hamin: “Understanding what went wrong would be a good start. There are several big, open questions about how a failure like this could be allowed to happen, and few satisfactory answers. A better understanding of real-world cloud compromises would help us understand why these failures occur and help drive solutions for problems ranging from underinvestment to unsafe designs. Cloud service providers should have more of an obligation to work with the government in the wake of a major incident, and government should have more tools (and drive) to translate those insights into public accountings and policy prescriptions.”
Higgins: “At this point, I feel that it is time to bring a cloud focused version of FedRamp to help move the cloud service providers into a stricter reporting framework.”
Hughes: “While major cloud service providers may be struggling to engineer critical cloud-native services to the level of reliability that the current threats demand, there is not a viable alternative aside from returning to on-premises legacy infrastructure, which is not an option in the era of digital modernization. Policies and regulations can play a role in governing the cloud as they do for other critical infrastructure sectors on which society relies. As evident in a recent Atlantic Council report, cloud computing is now pervasive in nearly every aspect of society that touches software. Policy can also help, as discussed in the National Cybersecurity Strategy, by bringing some rationalization to bespoke, disparate, and duplicative frameworks and bolstering those that help properly manage risk in the era of cloud computing. Policies should require hyperscale cloud service providers to provide sufficient transparency for security incidents and disruptions to both regulators, federal entities, and customers. Transparency breeds trust, but right now we exist in an opaque ecosystem of limited insight from cloud service providers.”
Mogull: “This was a Microsoft issue, and I do not think the other hyperscale providers face the same struggle. That said, I see buying power as more capable of moving the needle than policy could be. The Trustworthy Computing initiative came about because the Defense Department told Microsoft that it would not purchase Microsoft products without massive security improvements. Right now, neither government agencies nor large companies are prioritizing security in their buying decisions, which means that there is not enough pressure on cloud service providers to improve security. Policy absolutely has a place, but I think cloud security could be improved more quickly and effectively if the government prioritized security in provider selection.”
Rogers: “I see several opportunities for policy to support security without being overly burdensome. The Software Bill of Materials is already in flight and offers a way to shine a light on the ingredients of complex stacks. Clearing up the balance of liability is a motivator that would keep companies including Infrastructure-as-a-Service providers honest. A minimum set of tools, resources, and processes would lead to standardization and availability during critical moments. Minimum security features like logging should always be a default, not a profit center.”
#3 What is the difference between a software flaw and an architectural flaw in the cloud? How does policy address one vs. the other?
Hamin: “Software bugs are errors in written code that enable exploitation, such as unsanitized inputs used unsafely, unsafe use of memory, or incorrect permissions-checks. Architectural flaws are deeper flaws emerging from the design of complex software systems, such as inappropriate connections between services that should not be talking to each other, or concentrated reliance on a few brittle dependencies. Policy can mandate procedures (though often incomplete!) for how organizations should write code and train developers to avoid common vulnerable software patterns. But I think policy is just starting to think about architectural risk in software systems and does not have an evolved toolkit for addressing it yet.”
Higgins: “The question is too general. Both can lead to equally widespread, negative impacts. Most architectures are software these days anyway.”
Hughes: “Many may argue that these are one in the same, or increasingly similar, in an era in which we have software defined perimeters, architectures and computational resources provisioned declaratively through Infrastructure-as-Code languages. That debate aside, a software flaw would typically relate to written software in various programming languages and could escalate into a vulnerability, often tracked in a vulnerability database with a correlating identification. An architectural flaw on the other hand is not a vulnerability in software but in how a system is configured. We have seen these run rampant in the cloud with customer misconfigurations that lead to incidents, but also in fundamental ways with how the cloud is architected that lead to scenarios such as system outages or even exploitation by malicious actors.”
Mogull: “Software flaws are basically coding errors and vulnerabilities. Architectural flaws are more design decisions. For example, look at how AWS handles regions (highly segregated) compared to the competition. Policy cannot help here. Policy should demand a secure outcome and not define either software or architectural decisions. If lawmakers focus on the highly variable technical and architectural options that will change from year to year and day to day, they will never be able to keep up. Penalties for preventable security failures will force the right architectural decisions. We know what needs to be done to improve security, but prioritizing those actions is the issue.”
Rogers: “Software flaws are easier to manage and support through policy by ensuring good practice such as the use of memory-safe languages or the implementation of widely understood Secure Development Guidelines in a well-developed software development life cycle. Architectural flaws are much more complicated. The low-hanging fruit can be addressed in a similar way to software with a mature software development life cycle, good testing practices, and industry guidance, such as the deprecation of known vulnerable configurations or methods. However, the more complex end gets much harder. Issues like logic flaws, interconnection with legacy infrastructure, and unintended contextual risks can be hard to eliminate completely and hard to draft policy for without chilling innovation or even making migrations impossible. My suggestion is to focus on the baseline software development life cycle and require high standards in testing and transparency.”
More from the Cyber Statecraft Initiative:
#4 Why does transparency in cloud services and infrastructure matter for cloud users? What are some examples of what meaningful transparency looks like?
Hamin: “The shared responsibility model for cloud services has a lot of advantages, including outsourcing complexity to large Infrastructure-as-a-Service providers and letting small organizations take advantage of the benefits afforded by the cloud. But this model also breaks some important elements of how we think about risk management, especially with respect to data. Cloud customers are most often the ones with specific legal and contractual obligations to protect their data or to ensure operational continuity. However, customers often do not have visibility into what is behind the veil that separates their responsibility from that of their cloud service provider to understand the other half of that equation. Policy needs to adapt to make sure there are real mechanisms to propagate requirements for data protection, transparency, and trust for cloud service providers that provide computing infrastructure for healthcare or banking institutions, for example.”
Higgins: “Transparency builds accountability and trust; it is that simple. Meaningful cloud service provider transparency may include: 1) Software Bills of Materials or some kind of accountability to show what software contains; 2) root access numbers to show how many employees have access to data under normal circumstances; and 3) logs of security incidents involving the cloud over a period of time, indicating response capabilities and whether incidents repeatedly share the same root causes.”
Hughes: “Transparency in cloud services and infrastructure is paramount for cloud users, especially on the cybersecurity front. Cloud computing is fundamentally built on a shared responsibility model, which has implied assumptions of various responsibilities across the cloud provider and consumer. Without transparency, the assurance around those responsibilities being fulfilled by the provider is inherently called into question by the consumer, which threatens the entire cloud paradigm. Even an implied lack of transparency can rattle trust. Meaningful transparency would entail cloud service providers being forthcoming with details of incidents, how they were identified, confirmed and potential ramifications, and meaningful actions consumers can take to mitigate risk. Being opaque with incident details or providing them slower than other security researchers and vendors, for example, neither bolsters the cloud service provider’s reputation nor the community’s trust.”
Mogull: “Transparency allows customers to make both informed buying decisions and technical decisions. Meaningful transparency is seen in the AWS incident reports that the company releases after major public outages or issues. Lack of transparency is exemplified by how the company does not always disclose the scope of security incidents.”
Rogers: “One of the greatest risks around cloud services is the fact that they include a tradeoff. You trade a significant amount of visibility and operational control for ease of implementation, access to mature technology, and reduced cost. That lack of visibility can strip away the ability of anyone but the provider to manage risks, understand the blast radius of an incident, or even know when an incident has occurred. A sensible amount of transparency—Software Bills of Materials, useful logs, transparent joint architecture reviews, and so on—can help mitigate the lack of visibility.”
#5 How can policymakers encourage cloud adoption in a way that supports security and does not create new sources of risk?
Hamin: “There is a reason we have critical infrastructure sectors over which the government performs more oversight than it does for other sectors. These are places where the risk of getting it wrong is too high to tolerate, and the major cloud service providers should be considered in that category already. Cloud service providers should be coming to the table and working with policymakers on risk and threat models, architecture reviews, and the like. That said, there is a lot of more mundane, we-know-it-already stuff that we need to get right for secure cloud adoption too. Organizations still fail to configure and use the cloud securely, and credential theft and phishing are still huge threats. These might be cases where government can lead the way in pushing known best practices and ensuring that sector-specific security regulations are up to date with the evolving needs of cloud-based systems.”
Higgins: “No clue. We play whack-a-mole in the security world, meaning that when we fix one area of security, it causes another vulnerability to arise. Policy should drive awareness to risk rather than trying to reduce actual risk.”
Hughes: “Policymakers can encourage secure cloud adoption by harmonizing and bolstering applicable frameworks, as well as providing more robust oversight and governance of these cloud service providers that are now dubbed ‘too big to fail’ and ‘critical infrastructure’ by some industry leaders. Encouragement around adoption should also involve educating consumers, as the role of consumers is important, as demonstrated in countless cloud security incidents. Policymakers should avoid hyperbole and spreading fear, uncertainty, and doubt related to the cloud, while instead raising valid concerns grounded in data. On premises infrastructure is not infallible and has been breached or impacted by security incidents historically as well. That said, such breaches to on-premises infrastructure generally impacted a single organization or small group of customers, as opposed to the society-wide impact that cloud risks can bring.”
Mogull: “Policymakers can encourage secure cloud adoption through transparency requirements on security incidents, mandated vulnerability disclosures, mandated customer notifications for security incidents, and buying pressure to steer agencies towards platforms that demonstrate a stronger security posture. If security issues continue, some providers may need to be classified as systemically vital as we do with systemically important financial institutions. That would put cloud service providers under a microscope. I would prefer we not get to that point, but some providers seem to be doing their best to drive that outcome.”
Rogers: “First and foremost, while the cloud is a fantastic tool, it is a not panacea. Policymakers should use the levers of government to level the playing field, and use purchasing levers to ensure government business goes toward providers that help keep this playing field level, help ensure risk is controlled and, most importantly, empower their customers to handle a wide range of risk scenarios as standard practice.”
Simon Handler is a fellow at the Atlantic Council’s Cyber Statecraft Initiative within the Digital Forensic Research Lab (DFRLab). He is also the editor-in-chief of The 5×5, a series on trends and themes in cyber policy. Follow him on Twitter @SimonPHandler.
The Atlantic Council’s Cyber Statecraft Initiative, under the Digital Forensic Research Lab (DFRLab), works at the nexus of geopolitics and cybersecurity to craft strategies to help shape the conduct of statecraft and to better inform and secure users of technology.