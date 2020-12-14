Defense organizations, by nature, confront unanticipated and highly impactful disruptions, but must continue to operate using complex mission systems. They must adapt these systems to withstand surprise and accomplish defined objectives despite disruption and the behavior of adversaries. It is crucial to understand a system as more than hardware or software—it is a combination of people, organizational processes, and technologies. Mission resilience is the ability of a mission system to prevent, respond to, and/or adapt to both anticipated and unanticipated disruptions, optimizing efficacy and long-term value. This means overcoming sophisticated cyberattacks and managing the risk of systemic software vulnerabilities, but it also encompasses changing operating environments, adversary innovation, and unexpected failures. Resilient mission systems should have the capacity to continue mission essential operations while contested, gracefully degrading through disruption rather than collapsing all at once.

Resilience is a key challenge for combat mission systems in the defense community as a result of accumulating technical debt, outdated procurement frameworks, and a recurring failure to prioritize learning over compliance. The result is brittle technology systems and organizations strained to the point of compromising basic mission functions in the face of changing technology and evolving threats.

Resilience is not a novel concept, but it tends to be presented as a technology issue. While technologies provide the most intuitive and concise examples for understanding resilience, people are responsible for selecting a system’s purpose and mission, designing a system’s technologies, and enforcing organizational processes within a system. This report provides actionable strategies and practices to combat mission system program owners who manage complex, software-intensive systems, enabling them to reshape their organizations to perform in a state beyond normal operational boundaries—otherwise known as graceful extensibility.

This report translates concepts of mission resilience into practice for defense organizations. Drawing from academia, industry, and government, the authors distill four principles and specific activities as a framework for long-term change that defense organizations should adopt in pursuit of graceful extensibility: embrace failure, always be learning, improve your speed, and manage trade-offs and complexity. These principles build on previous work and combine discussion of procurement with operations, leaning on concepts and phrases used in slightly different ways by communities, like command and control (C2), which might think of managing trade-offs at speed as an issue of agility and biomimetics. Within each of these four principles are tangible practices that defense organizations can adopt to be more resilient:

Embrace Failure: Everyone and everything fails eventually—software developers are no different—so defense organizations must develop a healthy relationship with failure in order to succeed. Unwillingness to take risks creates a fear of failure and a resulting brittle culture, the consequences of which outweigh the failure itself. Practices that defense organizations can adopt to embrace failure include chaos engineering and planning for loss.

Improve Your Speed: The Department of Defense (DoD) must make improving speed of adaptation and development a focus in its transformation toward more resilient mission systems. Antiquated acquisition policies, misapplied bureaucratic oversight, and siloed knowledge make it more difficult for DoD programs to deliver capabilities than should or could be the case. This principle emphasizes speed and tight feedback loops, informed by agile methodologies of continuous integration and delivery.

Always Be Learning: Defense organizations operate in a highly contested cyber environment. As the DoD grows more complex, it becomes increasingly important how the organization learns and adapts to rapidly evolving threats. This process of continual learning embraces experimentation and measurement at all levels of systems as a tool to define and drive improvement.

Manage Trade-Offs and Complexity: Project management is a balancing act among cost, time, scope, and quality for defense organizations. The DoD should work to improve mission system programs’ understanding of the trade-offs between near-term functionality and long-run complexity as well as their impact on a system’s resilience.

Mission resilience must be a priority area of work for the defense community. Resilience offers a critical pathway to sustain the long-term utility of software-intensive mission systems, while avoiding organizational brittleness in technology use and resulting national security risks. The United States and its allies face an unprecedented defense landscape in the 2020s and beyond. The capabilities of both long-identified and novel adversaries continue to evolve, and bureaucratic conflict waged today will shape outcomes on battlefields in the years to come. For the first time in more than four decades, the prospect of significant great power conflict cannot be ruled out and neither the United States nor its allies can afford to acquire, maintain, and deploy mission systems with a mindset shaped in those decades past.

Introduction

The United States’ most expensive weapons system, the Lockheed Martin F-35 Lightning II, was designed as a fifth-generation joint strike fighter for service in decades to come. A major selling point to differentiate the F-35 from other aircraft was the Autonomic Logistics Information System (ALIS), the IT backbone of the system intended to govern F-35 operations, including (but not limited to) flight scheduling, maintenance and part tracking, combat mission planning, and threat analysis.

However, ALIS has been plagued by flaws and vulnerabilities, including several identified in early testing that still remain unfixed. Where security audits and testing have occurred, they’ve taken place in isolated laboratories incapable of simulating the full breadth of the aircraft’s digital attack surface. Officials, fearing failure, worried that real-world full-system tests would interrupt operations and disrupt development of the ALIS software. Software vulnerabilities and programmatic issues are hampering the servicemembers whom ALIS was intended to support: “one Air Force unit estimated that it spent the equivalent of more than 45,000 hours per year performing additional tasks and manual workarounds” due to the system’s malfunctions. ALIS’ inefficiencies have become so acute and costly that the Department of Defense (DoD) opted to overhaul it with the cloud-based Operational Data Integrated Network (ODIN), built by the same vendor.

The F-35 is a combat aircraft—and a software-intensive one at that. ALIS and similar backbone IT systems promise great value, but have barely gotten off the ground. The DoD has demonstrated an inability to manage complexity and develop robust and reliable mission systems even in a relatively benign environment. A conflict or more contested environment would only exacerbate these issues. The F-35 is not alone in a generation of combat systems so dependent on IT and software that failures in code are as critical as a malfunctioning munition or faulty engine—other examples include Navy ships and military satellites. Indeed, encapsulating the centrality of the aircraft’s complex IT backbone, now retired Air Force Chief of Staff Gen. David L. Goldfein once posited, “when I see the F-35, I don’t see a fighter. I see a computer that happens to fly.” Software-intensive mission systems of this and future eras will form the backbone of US and allied military capabilities. These capabilities will continue to be asked to adapt to new roles and do more with less, as budgets are rightsized and adversaries evolve. But existing acquisition, development, and deployment methodologies continue to fail these systems, failing to keep pace with the demands of users in the field and struggling to manage the complexity of ever larger and more integrated software and hardware projects.

Lockheed Martin’s test pilot checks a F-35 simulator before Israel’s Defence Minister Moshe Yaalon’s visit to the Israeli Air Force house in Herzliya, Israel. Source: Reuters/Baz Ratner

To ensure mission systems like the F-35 remain available, capable, and lethal in conflicts to come demands the United States and its allies prioritize the resilience of these systems. Not merely security against compromise, mission resilience is the ability of a mission system to prevent, respond to, and adapt to both anticipated and unanticipated disruptions, to optimize efficacy under uncertainty, and to maximize value over the long term. Adaptability is measured by the capacity to change—not only to modify lines of software code, but to overturn and replace the entire organization and the processes by which it performs the mission, if necessary. Any aspect that an organization cannot or will not change may turn out to be the weakest link, or at least a highly reliable target for an adversary. Moving beyond the issues that plague programs like the F-35’s ALIS—a complex and evolving system in an ever-changing operational environment—will only be possible by coming to terms with past problems. But, by doubling down with similarly designed systems such as ODIN, defense organizations are bound to repeat the same expensive mistakes.

Efforts to invest in new software acquisition, and to reform policy impacting mission systems, are regularly proposed and attempted but continually fall short. At the same time, adversary capabilities, including kinetic platforms and cybered effects, evolve more rapidly than those of blue forces, and recurring, systemic difficulties in embracing commercial off-the-shelf (COTS) technology continue. The DoD’s uneven move to adopt cloud computing, slow by comparison to Fortune-500-scale organizations, exemplifies this problem.

For decades, studies have recognized the vital importance of software as an integrator of defense mission systems, and they have put forth strong recommendations on how to improve it. For equally as long, however, frustrations have mounted over lack of implementation and continued stagnation in the defense enterprise. As pointed out in the Defense Innovation Board’s congressionally mandated 2019 Software Acquisition and Practices Study, “the problem is not that we do not know what to do, but that we are simply not doing it.” The study highlights two people problems—middle management and congressional mismatch—as reasons for lack of progress.

In addition to these organizational and oversight factors, the DoD is making changes to the way it acquires software, but these need to address software embedded in physical and safety-critical systems, as well as where the tolerance for failure and experimentation is lower and resulting program models more risk-averse. Kessel Run is a useful model to bring continuous integration/continuous deployment into responsively developed software. The scale of these projects is small enough to avoid significant systems or project management overhead; the security requirements of these projects invite relatively straightforward classification and minimal compartmentalization; and the development time and life cycle length of these projects complement the software factory approach.

But resilience requires more than new technology incubators—it necessitates taking development out of a silo and knitting it together with users, as well as security organizations like the 16th Air Force and the 10th Fleet. For more complex projects, those with more dependencies on legacy systems, and those which are embedded in or significantly impact safety-critical and physical systems, the once-off hybrid model may be insufficient.

This report addresses the significant disconnect between contemporary understandings of resilience in defense organizations and the importance of software-intensive mission systems. By focusing the conversation on adaptation, this joint effort between MIT Lincoln Laboratory and the Atlantic Council’s Cyber Statecraft Initiative, under the Scowcroft Center for Strategy and Security, develops a working-level concept of mission resilience and uses this concept, along with specific practices from government, academia, and industry, to guide mission resilience in defense organizations.

Fundamentally, mission resilience is built on three pillars: robustness, the ability of a system to resist or negate the impact of disruption; responsiveness, the ability of a system to provide feedback on and incorporate changes in response to the impact of disruption; and adaptability, the ability of a system to change itself to continue operating amid disruption over its full life cycle, even when those changes dictate an adjustment of the system’s objectives. This definition encompasses the ability to encourage and enable systemic adaptation and expands beyond resistance to disruption (e.g., defects, faults, attacks, and even intentional change). These pillars function in symbiosis and when exercised in concert with one another create mission resilience, an attribute that is greater than the sum of its parts.

Sustained progress and continual change are critical to the resilience of defense organizations; in this, Richard Cook’s discussion of the human skeleton is an apt metaphor. Despite its static appearance, human bones are continuously remodeled and replaced roughly every ten years—a process spurred by mechanical strain that enables the destruction of old bone and creation of new bone. This “dynamic balance” requires incessant inputs and energy in order to maintain bone density and prevent skeletal weakening that can be prone to disease and breakage. In the event of a break, it is critical that bone be put under conditions for its natural resilience to do its best work.

Some organizations in the private sector have set an example in harnessing this natural resilience through high-tempo, continuous change. Unfortunately, inadequate strain lines have hampered defense organizations’ pursuit of resilience and led to deformity. The next four sections offer four principles for defense organizations’ pursuit of mission resilience—1) embrace failure, 2) improve your speed, 3) always be learning, and 4) manage trade-offs and complexity, followed by a conclusion. Each section explains concepts of mission resilience as distilled to that principle, as well as previous relevant research and discussion from government, academia, and industry. Each section concludes with actionable practices and specific recommendations for reforming acquisitions policy, the operation and management of mission systems and their program offices, and their integration into combat units.