Data rules for machine learning: How Europe can unlock the potential while mitigating the risks

Cybersecurity Digital Policy European Union Technology & Innovation

Report November 15, 2021

Data rules for machine learning: How Europe can unlock the potential while mitigating the risks

By Blanka Soulava, Hamish Cameron and Victoria Ying

Executive summary

Introduction

Part one: Leverage the scale of Europe

Part two: Balance economic autonomy and openness

Part three: Protect fundamental rights

Conclusion

Acknowledgements and author bios

download pdf

download summary

Executive summary

Artificial intelligence (AI) will increasingly shape societies and the global economy. Machine learning—which is responsible for the vast majority of AI advancements—is enhancing the way businesses and governments make decisions, develop products, and deliver services.¹ For machine learning algorithms to learn and make increasingly accurate predictions, they need large quantities of high-quality data that is relevant, representative, free of errors, and complete.² As such, data policies have a significant bearing on an economy’s capacity to take advantage of machine learning.

Supremacy in AI technologies has become a key aspect of strategic competition between China and the United States. President Xi Jinping has made achieving global leadership in AI by 2030 central to building China into a “modern socialist power.”³ China’s rapid advances in machine learning have alarmed America’s private sector and national security establishment.⁴ The Biden administration is looking to shore-up America’s national footing to compete with China in an AI arms race.⁵ This competition also extends to setting the global rules and norms that govern AI systems, given their influence on how economies and societies function.⁶

The European Union (EU) already lags behind China and the United States in key AI indicators such as private investment,⁷ patent filings,⁸ and data market growth.⁹ As China and the United States move to position their laws, bureaucratic structures, and public resources to accelerate the development and deployment of AI technologies,¹⁰ the EU faces the added challenge of coordination among its twenty-seven member states to do the same.

Aware of this challenge, the European Commission has made “a Europe fit for the digital age” a priority to achieve by 2024.¹¹ It is pursuing a “third way” between China and the United States in shaping its digital future, particularly when it comes to the use of data and AI.¹² This has included a raft of legislative proposals, with more in the pipeline, informed by the commission’s European Strategy for Data and its White Paper on Artificial Intelligence, both released in 2020.¹³ Those proposals grapple with striking the right balance between reaping the economic benefits of those technologies while minimizing security threats and protecting the public from harm, including preserving individual rights such as privacy and nondiscrimination.¹⁴

With the digital economy predicted to be valued at $11 trillion by 2025 on one hand,¹⁵ and the risk machine learning can pose to fundamental rights and European autonomy on the other, the cost of getting that balance wrong is high. The European Commission will need to walk a fine line that separates the legitimate interests of businesses, citizens, and national governments to capture the economic benefits while limiting the negative social impacts of machine learning. Doing so will place the EU in a strong position to shape global rules at the intersection of technology and society to reflect European systems and values more closely.

Given the crucial role data plays in developing innovative AI systems, this report focuses on data governance in the context of machine learning. It provides European decision makers with a set of actions to unlock the potential of AI, while mitigating the risks. Developed at the request of the Atlantic Council, a nonpartisan organization based in Washington, DC, it examines the geopolitical considerations of the EU in the context of competition with China and the United States, the domestic considerations of EU member states including societal interests and the protection of individual rights, and the commercial considerations of EU-based businesses.

The report builds on the existing regulatory initiatives of the European Commission and attempts to lay out key considerations of the EU, the member states, and businesses. In doing so, we believe it captures the dominant features of the current policy debate in Europe to help navigate the complex equities at stake in developing data rules. That said, framing the issue of data governance around international political and economic competition can result in lines of reasoning that pay less consideration to important societal interests and can trivialize complex and multidimensional issues, such as social equity and individual rights. Therefore, we devoted the last section of this report to the protection of fundamental rights, with two chapters focused on these issues. We also have attempted to flag these cases in other parts of the report when they arise in our analysis.

Summary of recommendations

Leverage the scale of Europe

Addressing fragmented open data rules and procedures across Europe and harmonizing technical data standards would increase the quantity of data available for innovation and reduce the complexity and cost of preparing data for machine learning.

Therefore, the European Commission should:

1.1A Establish an EU data service under legislation creating common data spaces to facilitate access to public-sector data in consultation with national data protection
authorities by:

i. Serving as a single window to grant access to data sets spanning multiple jurisdictions under a single or compatible format and license.
ii. Making recommendations to European public bodies on conditional access to sensitive data in consultation with national data authorities.

1.2A Establish an EU technical data standards registry to increase transparency and facilitate sharing of various standards used by EU-based organizations and help identify opportunities to increase interoperability.

1.2B Establish sector-specific expert committees to recommend preferred technical data standards, including:
i. Standard data models that enable the translation of differently structured data across systems and organizations.
ii. Standard application programming interfaces (APIs) that enable apps and databases to seamlessly exchange data.
iii. Free open source data management-software that embed sector-specific standards.

Balance economic autonomy and openness

By promoting diversification in cloud data storage and processing services, the EU can mitigate vulnerabilities to disruptions and coercion. However, it also needs to maintain competitive cloud offerings for European businesses and avoid succumbing to calls for protectionist measures. Simplifying rules and minimizing legal ambiguity around data transfers can also lower the risk to European businesses of being excluded from global value chains, as can removing anticompetitive barriers to accessing data while maintaining incentives for businesses to invest in high-quality data sets.

Therefore, the European Commission should:

2.1A Enhance the implementation of the European Alliance for Industrial Data, Edge and Cloud by seeking additional investments from participating cloud computing providers and industrial cloud users.

2.1B Ensure coordination between the European Alliance for Industrial Data, Edge and Cloud and parallel initiatives and ensure that the emerging legislative proposals do not impose significant regulatory burdens on cloud service providers that could impact the competitiveness of their offerings.

2.2A Clarify legal ambiguity on liability and ownership when transferring data, including in the General Product Safety Directive, Product Liability Directive, and in planned legislation on sector-specific common data spaces.

2.2B Prioritize data protection adequacy negotiations with global innovation hubs that share European values on the protection of personal data to enable the transfer of data outside the EU, including exploring solutions with the United States following the Schrems II decision.

2.3A In planned sector-specific “common data spaces” legislation, adopt sector-specific data portability rights for customer data (personal and nonpersonal) that consider the distinct data access problems at an industry level.

2.3B In planned sector-specific common data spaces legislation, adopt sector-specific data access obligations for platforms and dominant companies to share data with other businesses that consider the business models, public interest, and economic sensitivities of each sector.

2.3C Establish a consultative mechanism involving industry, national standards organizations, the proposed European Data Innovation Board, and competition authorities to periodically review the suitability of sectoral data-access rules as technologies and market structures change.

Protect fundamental rights

The EU can ensure the protection of fundamental rights while enabling data use for innovation and protecting privacy by decreasing compliance costs and legal uncertainty of data protection while investing in research on machine learning techniques that reduce the need for large pools of personal data. The EU can also mitigate data-driven discrimination through machine learning by addressing gaps in data protection and antidiscrimination laws and strengthening enforcement capabilities.

Therefore, the European Commission should:

3.1A Coordinate with the European Data Protection Board and national data-protection authorities to provide more detailed guidance and assistance on
applying the General Data Protection Regulation (GDPR) to machine learning applications to enhance compliance and mitigate business costs.

3.1B Enhance investment in research in privacy-preserving techniques that can adequately protect personal data while enabling machine learning development, including in partnership with like-minded countries such as the United States.

3.2A Work with the European Parliament and member states on providing more specific requirements in the new EU AI legislation for data-management standards to mitigate risks of discrimination. These should focus on greater transparency in access to AI systems documentation and ensuring that data requirements also address cases where the data sets could embed and amplify discrimination.

3.2B Provide sufficient financial and human resources to national data-protection authorities and other relevant bodies that investigate antidiscrimination violations and carefully define and coordinate the establishment of new enforcement authorities with the existing ones.

3.2C Increase transatlantic cooperation to close the gap between US and EU approaches to mitigating risks of bias and discrimination enabled by data in machine learning through the EU-US Trade and Technology Council.

Machine learning is a powerful tool for analyzing very large data sets and spotting patterns in that data. It has two key ingredients: algorithms that learn and data to train machine learning systems.”

Introduction

Machine learning is changing the way public- and private-sector organizations operate, make decisions, develop products, and deliver services. It is already a big part of people’s daily lives, with a rising number of applications emerging across society from healthcare to transportation. As the technology develops, societies at the forefront of machine learning will become more innovative and their industries more competitive.¹⁶ Those that fall behind risk moving down the value chain and taking a smaller slice of the global economy. However, those that that do not manage data rules and machine learning applications with care risk undermining individual rights and exacerbating existing social problems.

As a major subset of artificial intelligence (AI), machine learning is a powerful tool for analyzing very large data sets and spotting patterns in that data.¹⁷ It has two key ingredients: algorithms that learn and data to train machine learning systems. Self-learning machines take each additional data point and adapt the new information to make more accurate predictions. To make very precise predictions, like diagnosing a disease from an eye scan, machine learning requires very large quantities of reliable data, in this case, accurately labeled eye scans.

Rules that govern data have a significant bearing on an economy’s capacity to take advantage of machine learning. Industries that can more easily exploit large quantities of high-quality data will be better able to take advantage of the technology. Autonomous vehicles are a good example of an emerging disruptive innovation that depends on gathering high-quality data to feed algorithms. When large amounts of data are available, data scientists can train autonomous vehicles to be less likely to have a collision. However, the quality of these data sets erodes when data is missing, incomplete, inaccurate, duplicated, or dated.¹⁸

Importantly, data rules also have a significant bearing on society and individual rights. Rules that govern who can access, collect, use, and store data—such as who you are, where you go, what you say or do and with whom, the products you buy or the maladies you suffer—dictates who gets to surveil your life and on what grounds.¹⁹ As such, data rules can have significant implications on rights widely viewed in Europe as fundamental to freedom and equality, such as the right to respect for private life, the protection of personal data, and the right to nondiscrimination.²⁰

Given the economic, social and ideological implications, data rules can exert geopolitical influence on how economies and societies function. Whose rules become more commonly adopted by other countries underpins whose economic, social, and ideological systems dominate global digital norms. For example, how open an economy is to international trade and foreign investment,²¹ how free society is from repression,²² and how protected individuals are from harm.²³

Leading the development of frontier technologies has become central to the intensifying rivalry between China and the United States, and AI is a key aspect of that competition.²⁴ Chinese leaders believe that being at the forefront of AI technology is critical for China’s economic and national security.²⁵ President Xi Jinping has outlined policies for China to be the world leading AI power by 2030.²⁶ This has included designating national champions that are each focused on different applications of machine learning, like facial recognition and language processing, as well as incentives for start-up investments and patents applications in AI technologies.²⁷ China now outpaces the United States in AI start-up investment, patents, and performance in international AI competitions.²⁸ In 2020, China’s share of global AI journal citations were higher than the United States for the first time.²⁹

In the United States, which has been leading the world in AI,³⁰ there is bipartisan support in Congress to approve the largest boost to R&D funding in a quarter century,³¹ with specific reference to maintaining leadership in AI technologies.³² The 2021 National Security Commission on Artificial Intelligence Final Report has also called for a new whole-of-government organizational structure to coordinate, incentivize and resource the accelerated wide-scale adoption of AI technologies in the United States—the goal is to position the nation to compete with China.³³ The European Union (EU) is falling behind China and the United States in machine learning.³⁴ One telling indicator is the level of private-sector investment in AI, with investment in China and the United States being much larger than the amount seen in the EU, as shown in figure 1.

The key challenge the EU faces includes the complexity of coordinating data rules and standards among twenty-seven member states and their fragmented implementation at the national and subnational level. Higher public expectations around privacy and data protection also means the EU must proceed with greater care in making data available for businesses and researchers, slowing the development of machine learning applications in the shorter term, though potentially creating a more sustainable approach that places European innovation on stronger footing in the longer term.³⁵

In comparison, China and the United States both have larger, more homogenous data ecosystems that can be more easily accessed by data scientists, better facilitating the training of machine learning algorithms. As a result, the US data market is three times the size,³⁶ compared to the EU’s, and is growing at twice the rate.³⁷ China’s data market is larger still and growing at three times the rate of the EU’s.³⁸ On its current trajectory, the EU will fall even further behind China and the United States, potentially leaving it dependent on foreign supplies of AI technologies trained on data outside Europe.

In an effort to compete globally, the EU is grappling with data policies that strike the right balance between innovation, security, and the protection of fundamental European rights. Privacy and security concerns in Europe have already undermined public trust in foreign governments and private companies. These concerns are amplified by Europe’s data infrastructure being largely controlled by only a few foreign suppliers, Washington’s ability to impose sanctions that undermine the policies of European countries,³⁹ and the increasing tendency of China to deploy economic coercion to assert its national interests.⁴⁰ Many Europeans are also made uneasy by the potential for machine learning to exacerbate discrimination in society.⁴¹ This is because data may reflect existing structural inequalities and the realities of an unjust society, which can become embedded in AI systems and exacerbate discriminatory outcomes.⁴² Another concern is that inadequate data protections deployed by businesses outside the EU have allowed countries like Russia to exploit them for political purposes to target disinformation campaigns and influence elections in Europe.⁴³ Reconciling these concerns with the benefits of data access and re-use for machine learning will be an ongoing challenge for the EU.⁴⁴

There is the risk that if the EU fails to resolve these tensions through data rules that foster innovation, businesses could face higher costs to deploy machine learning, weakening their ability to compete in a global market. The resulting lack of European innovation could reinforce the EU’s dependencies on foreign digital technologies, with local industries taking a smaller slice of the global value chain. However, there is also a risk that the EU implements short term, innovation-driven policies that could embed economic systems that fail to protect fundamental European rights, exacerbate social inequalities, and threaten democratic institutions and processes.⁴⁵ If the EU cannot get this balance right, its inability to present an effective regulatory model will undermine its ability to shape global rules at the intersection of technology and society.

Aware of these costs, the European Commission has made “a Europe fit for the digital age” a priority to achieve by 2024.⁴⁶ The EU has proposed a raft of legislative proposals, with more in the pipeline, informed by the commission’s European Strategy for Data and White Paper on Artificial Intelligence, both released in 2020, including the Data Governance Act proposal introduced in November 2020,⁴⁷ and the Artificial Intelligence Act proposal introduced in April 2021.⁴⁸ It has also allocated €143.4 billion ($166.25 billion) toward the EU “single market, innovation, and digital” between 2021 and 2027.⁴⁹ In pursuing this agenda, the commission needs to balance stimulating innovation with mitigating the risks machine learning can pose to fundamental rights, European autonomy, and society. This report proposes a path forward for the EU against the backdrop of rapid European rulemaking and intensifying competition between China and the United States in data and machine learning.

Our approach examines the EU’s geopolitical considerations, domestic considerations of EU member states including societal interests and the protection of individual rights, and the commercial considerations of EU-based businesses and makes recommendations designed to navigate the complex equities at stake. While our research takes a broad view, we sought specific examples, concerning autonomous vehicles and healthcare, to demonstrate the benefits and risks to society posed by the deployment of machine learning. We similarly sought examples from four countries—Finland, France, Germany, and Poland—to highlight the diversity of priorities, concerns, and approaches among EU member states.

The report proposes three broad objectives for the EU to pursue in developing data rules for machine learning:

leverage the scale of Europe
balance economic autonomy and openness
protect fundamental rights

Under these categories, we recommend fifteen specific actions for European Commission.

It is important to note that framing the issue of data governance around EU competition with the United States and China can lend itself to lines of argument that downplay the importance of protecting the rights of individuals to control their own data and the social implications of data-driven decision making based on machine learning systems. To make the analysis and recommendations more easily digestible, they are separated into seven chapters under three themes; however, all the sections are strongly interconnected and are not designed to be read in isolation from one another. We have attempted to flag in the analysis the cases in which lines of reasoning risk downplaying important social interests, which can be complex and multidimensional, and have linked them to other parts of the report that discuss those issues in more detail.

Part one: Leverage the scale of Europe

1.1 Increase access to public-sector data

Public-sector data has significant potential for commercial and research purposes including to develop new products and services with machine learning.⁵⁰ The scale of the EU single market offers opportunities to pool data resources, the reuse of which could contribute up to €194 billion to the EU economy by 2030.⁵¹ However, the EU needs to address the complexity of navigating fragmented open data rules and procedures across Europe and, with the appropriate privacy-preserving techniques and protections in place, enable limited access to sensitive data,⁵² if it is to innovate on par with China and the United States.

To achieve this, the European Commission should establish an EU data service that facilitates access to publicly held data, helping to reduce the administrative and legal complexity for businesses seeking to reuse nonsensitive public-sector data. The service could also work with national data-protection authorities and public bodies to make sensitive data more widely available for research and development purposes under adequate security and privacy-preserving protections that meet data-protection requirements, such as the GDPR.

This chapter should be read in conjunction with chapter 2.3 on data ecosystems, which looks at the implications of data sharing among businesses, as well as chapter 3.1 on privacy and trust, which has a greater focus on the risks to individuals, as both chapters are relevant to the issues discussed in this chapter.

EU policy context

There are fragmented rules and processes on accessing and using public-sector data in Europe. The EU’s 2019 Directive on Open Data and the Re-Use of Public Sector Information updated the minimum standard for public bodies to make nonsensitive data accessible to the public.⁵³

However, while the directive strengthens language against imposing conditions on the reuse of data unless justified on public-interest grounds, it does not require the use of standard licenses that make it easier for organizations to use and combine data sets.⁵⁴ The directive also implies that those seeking EU-wide data sets need to approach each public-sector body individually. Furthermore, it does not clarify conditions to access to sensitive data (see figure 2 for examples of sensitive data). The commission’s proposed Data Governance Act seeks to address this gap by clarifying obligations for the public sector to share sensitive data that falls outside of the scope of the Open Data Directive.⁵⁵ The commission has also foreshadowed sectoral legislation to create “common data spaces” to make public and private data more widely available.⁵⁶ It envisages these common data spaces will provide nondiscriminatory access to high-quality data between businesses and with government.⁵⁷

Geopolitical considerations

Barriers to accessing public-sector data put Europe at a disadvantage vis-à-vis China and the United States. Charles Michel, president of the European Council, has described the current unavailability and fragmentation of open data policies as the “weak point” of the European market.⁵⁸ Compared to the EU, Chinese organizations benefit from the ability to extract vast amounts of data from users owing to few and lax privacy laws.⁵⁹ The US private sector has amassed troves of data drawn from its dominant position in most sectors in the United States and globally.⁶⁰ For example, in a conversation with experts Marc Lange and Luc Nicolas from the European Health Telematics Association (EHTEL), they noted that US healthcare companies often had direct access to data given the heterogeneity and market orientation of the different health systems, giving US companies greater capacity to directly deploy machine learning in healthcare. They observed that health data in Europe was generally only open to each hospital’s research department, which, given their nonprofit legal status, sought mainly to publish scientific research rather than develop healthcare products and services.⁶¹ Given these systemic differences, to compete with China and the United States, the EU needs to better enable companies to train machine learning algorithms with data held by the public sector.

National considerations

EU member states have different levels of comfort with allowing access to sensitive data. Member states are concerned about ensuring and guaranteeing the trust and confidence of citizens and businesses (see chapter 3.1). For instance, countries where citizens have strong concerns about privacy, like Germany, have until recently been reluctant to enable public access to data in sensitive industries such as healthcare.⁶²

On the other hand, countries like Finland and Estonia lead the way in digital health, and health providers already exchange some patient data.⁶³ In 2019, the Finnish Act on the Secondary Use of Health and Welfare Data entered into force to facilitate access to social and health data, the first such legislation by an EU member state.⁶⁴ The act establishes a central data-permit authority for the secondary use of health and welfare data, unifying access procedures and decision making.⁶⁵ Given these different levels of comfort, the European Commission will need to balance increased data availability while ensuring data-protection measures satisfy the higher expectations of some member states.

Business considerations

Low-cost access to multiple public-sector databases can enable businesses to develop new products and services. Reaching the scale and scope needed to train an algorithm may require pooling data from a variety of open data sources. However, EU member states each approach the openness of public data differently,⁶⁶ meaning businesses need to treat data under different conditions, increasing legal complexity and operational costs.⁶⁷ One example is the range of different open data licenses used, which is estimated to reach ninety different licenses used across various national, regional, and municipal governments in the EU.⁶⁸ If businesses can more simply access and use open data across the EU, it would make it easier for businesses to combine data sets to train new products and services based on machine learning.

Businesses would benefit from simpler procedures to access sensitive data under conditions that maintain data protection. Implementing simple procedures for the public to access sensitive data under the right conditions can enable the development of innovative machine learning products and services. For example, Corti is a natural language processing start-up that learns from medical conversations in emergency calls to better and sooner detect if someone is having a heart attack by listening to key words, background noises, and emotions.⁶⁹ Copenhagen Emergency Health Services shared audio data from hundreds of thousands of emergency calls with Corti to enable machine learning; in trials, the software reduced the number of undetected out-of-hospital heart attacks in Copenhagen by 43 percent.⁷⁰ Corti was further piloted in Italy and France, and while meeting GDPR posed significant challenges, it was able to work with emergency services to meet European data-protection requirements.⁷¹ This demonstrates that enabling access to sensitive data with the right protections for privacy (see chapter 3.1) can lead to innovations that benefit consumers and society.

There are significant risks in allowing access to sensitive data, but privacy-preserving machine learning techniques are emerging. Once data is released, controlling who can view data and for what purpose becomes extremely difficult to enforce, opening opportunities for misuse.⁷² Data scientists are developing privacy-preserving techniques that can still produce accurate machine learning outputs, ranging from anonymization that removes sensitive entries and pseudonymization that replaces them with made-up ones—though these have been proven to be vulnerable to reidentification⁷³—to more sophisticated decentralized, differential, or encryption techniques.⁷⁴ Applying these privacy-preserving techniques as a condition to access to sensitive public data could provide opportunities for researchers to develop and deploy beneficial machine learning applications while protecting sensitive data.

Recommendations for the European Commission

A. Establish an EU data service under legislation creating common data spaces to facilitate access to public-sector data in consultation with national data-protection authorities, starting with the health dataspace planned for the end of 2022.⁷⁵

Piloting the new system in the healthcare sector would enable the authority to draw lessons from Finland’s permit-granting authority (Findata), before expanding its scope to cover other sectors. The service could be an expansion of the EU’s open data portal (data.europe.eu) and be jointly overseen by the European Data Protection Board, as an independent European body focused on the consistent application of data-protection rules throughout the EU,⁷⁶ and the European Data Innovation Board in the proposed Data Governance Act, given its intended focus on data standardization.⁷⁷ The authority’s functions could include:

Serving as a single window for data-access requests for data sets across multiple jurisdictions and coordinate access to various data sets under single or at least compatible format and license
Assessing requests for data not available for open access, including sensitive data, in consultation with national data-protection authorities and make recommendations to relevant public bodies on whether to allow access and under what conditions, such as the use of specific privacy-preserving techniques and security precautions

The service could operate on a cost-recovery basis, passing expenses on to the businesses and research institutes using the service, which would still help reduce costs arising from administrative and legal complexity for businesses seeking to reuse public-sector data. It would also increase the availability of sensitive data for research and development while ensuring adequate protections laid out in relevant national-level data-protection legislation, for example, through privacy-preserving machine learning techniques.

1.2 Harmonize data standards for interoperability

Machine learning and the Internet of Things (IoT) will bring artificial intelligence to our devices, factories, hospitals, and homes, creating efficiencies and driving productivity.⁷⁸ To achieve this, however, the EU needs to enable the sharing of high-quality data between distributed devices and systems.⁷⁹ The lack of harmonized data standards across the EU⁸⁰ slows the deployment of machine learning compared to China and the United States.⁸¹

To address this, the European Commission should facilitate the development of sector-specific data models, application programming interfaces (APIs) and open source software that enable the seamless exchange and translation of differently structured data across systems and organizations.

EU policy context⁸²

The European Commission’s efforts to encourage the voluntary adoption of data interoperability standards have made slow progress. For example, decadelong efforts by the commission under the European Patient Smart Open Services (epSOS) project to encourage the implementation of interoperable health data standards have been stymied by reluctant healthcare IT providers and ambivalent national governments.⁸³ More recently, the commission enacted the 2019 Recommendation on a Europe Electronic Health Record exchange, but it lacks legal enforcement to incentivize compliance.⁸⁴

Looking forward, the EU’s 2020 Data Strategy included the development of rules for common “data spaces” intended to “set up structures that enable organizations to share data.” The European Commission’s 2020 proposal for a Data Governance Act also includes the establishment of a European Data Innovation Board that would focus on data standardization.⁸⁵ However, it will be very difficult for member states to agree to a standard data format due to constraints by existing IT infrastructure as well as incompatible and inflexible national privacy and security regulations.⁸⁶ To facilitate widespread adoption of machine learning, the commission needs to find a way to enable data interoperability while avoiding costly and complex overhauls of member state regulations and IT systems.

Geopolitical considerations

China uses different digital standards than the EU and is internationalizing those standards through global bodies and bilateral agreements, making its digital exports more interoperable with digital systems in foreign markets. The EU has a strong voice on international data standards through international bodies like the International Standards Organization (ISO) and the International Electrotechnical Committee (IEC).⁸⁷ However, China is rapidly increasing its participation in standard setting and using other strategies to influence standards in export markets.⁸⁸ Björn Fägersten and Tim Rühlig at the Swedish Institute of International Affairs outline how China is rapidly developing industry standards and proposing new work items in ISO and IEC based on nascent technologies to gain first mover advantage in technical standardization.⁸⁹ China also is exporting its domestic standards through foreign investments such as Belt and Road Infrastructure projects, including incorporating clauses on standardization in government-to-government memorandums of understanding.⁹⁰ Given the scale of the EU, standards adopted EU-wide can create market power that sway smaller economies to adopt EU standards.⁹¹ However, if data standards remain fragmented in Europe, they are less appealing to other countries compared to Chinese or other technical standards that are already widespread.

National considerations

Capacity and existing regulations constrain some EU member states from implementing common data standards. Each EU member state has its own approach to public data infrastructure that uses disparate data formats based on legacy systems, as well as national security standards tied to privacy requirements,⁹² to which amendments can be controversial.⁹³ For example, Germany’s mandatory technical specifications for electronic health records (EHR) are different from those encouraged by the European Commission due to security requirements.⁹⁴ For other member states, regulatory constraints may be surmountable but there is a lack of organizational capacity to adopt interoperability standards. For example, while Poland faces fewer legal barriers,⁹⁵ its hospitals and clinics have been slower to digitize health data,⁹⁶ resulting in a Polish law requiring national adoption of EHR by 2014 to be delayed by three years.⁹⁷ In general, studies have found that Central and Eastern European countries continue to face difficulties operationalizing EHR due to fragmented policy frameworks and constraints on financial resources.⁹⁸

Business considerations

Reducing the time and costs involved in preparing data for machine learning will make the technology more accessible. Preparing large data sets for machine learning can be difficult, resource intensive, and costly for business. Currently, “data wrangling” takes up about 80 percent of the time consumed in most artificial intelligence projects, as data from various data sources may need to be manually labeled and structured in a coherent way.⁹⁹ Data also needs semantic interoperability, whereby the labels or categories mean the same thing across databases.¹⁰⁰ There are solutions to this problem in the form of data models like the Observational Medical Outcomes Partnership (OMOP) that can translate different data formats so long as the semantic vocabulary is the same (e.g., using the term “myocardial infarction” rather than “heart attack”).¹⁰¹ These kinds of data models have the potential to enable smoother compilation of large-scale data sets. In doing so, they make large amounts of data more easily available to machine leaning algorithms resulting in accurate outputs, such as a higher likelihood of correctly diagnosing a disease.

Interoperability—the ability for two systems to communicate effectively—is key for machine learning.¹⁰² For instance, patients are increasingly tracking and generating large volumes of personal health data through wearable sensing and mobile health (mHealth) apps.¹⁰³ One example is KardiaMobile, a personal ECG device that can monitor heart rhythms and instantly detect irregularities, such as atrial fibrillation that can lead to a stroke.¹⁰⁴ The Kardia app allows patients to track ECG data over time and share the recordings directly with their physician.¹⁰⁵ However, a European study of health clinicians using mHealth apps with their patients found that they did not have the means to transfer this data to the patient’s electronic health record.¹⁰⁶ This means a patient’s data from a device cannot be compiled with the patient’s other health data for further research and development purposes, which weakens the potential for machine learning to improve patient health outcomes.

Open source data-management software—which are characterized by open licenses, commitment to standards, and collaboration among developers—could offer a pathway to help institutions keep pace with technological advancements and enhance interoperability.¹⁰⁷ Compared to proprietary software, the open architecture of open source software can more easily allow for the integration of various products like medical devices and wearables into a single system, and commercial versions can adapt the software to support the needs of specific users.¹⁰⁸ Open source software can be vulnerable to hacking and data breaches when not maintained by an active community;¹⁰⁹ however, open source projects with an active user base are generally more likely to discover bugs and security vulnerabilities, and fix them more quickly, than most proprietary software (i.e., closed source software).¹¹⁰

Recommendations for the European Commission

A. Mandate and fund the European Data Innovation Board (in the proposed Data Governance Act) to establish an EU technical data standards registry to facilitate the sharing of various technical data standards between national standards organizations, industry bodies, associations, and businesses.

The purpose of the registry would be to increase transparency of the various data standards at the European, national, sectoral, association, and enterprise levels that are in use across the EU. This would assist technical experts identify opportunities and develop ways to enhance interoperability. The registry could draw on the example of the registry maintained by the Standardization Administration of China that includes government, industry regulator, local, association, and enterprise-level standards.¹¹¹ Another example is the comprehensive list of standards developed by the Canadian Data Governance Standardization Collaborative.¹¹²

B. Mandate and fund the European Data Innovation Board (proposed under the Data Governance Act) to establish sector-specific expert committees to recommend data interoperability standards, drawing on technical experts from standardization bodies, industry, academia, and consumer groups.

The expert committees could be established alongside each “common data space” and should specifically focus on:

i. Standard data models that enable the translation of differently structured data across systems and organizations. An example of such a data model is the OMOP model for health data mentioned above.

ii. Standard application programming interfaces (APIs) that enable apps and databases to seamlessly exchange data. For example, the United States requires health information to be made available by the Fast Healthcare Interoperability Resources (FHIR) API standard.¹¹³

iii. Free open source data-management software that embed sector-specific standards, encourage a community of practice, and allow for commercial adaption to serve the needs of specific users.

The widespread adoption of common data models would enable EU member states to implement data interoperability standards without overhauling costly IT infrastructure and amending sensitive privacy and security regulations. Combined with common APIs and free open source data-management software this would reduce costs and increase accessibility to data for businesses to adopt machine learning. The broad adoption of these standards in the EU would also increase the chances of European data interoperability standards being adopted internationally.

Part two: Balance economic autonomy and openness

2.1 Diversify data storage and processing

Cloud service providers increasingly become geopolitical actors whose influence shapes economic growth, international security competition, and global balance of power.¹¹⁴ To ensure redundancy in essential data services and reduce vulnerability to potential coercion, the European Union needs to diversify its dependency on a few non-European cloud service providers. However, in doing so, the EU needs to maintain competitive cloud offerings for European businesses and avoid succumbing to calls for protectionist measures that potentially harm businesses and could negatively impact some EU member states.

The European Commission should consider further enhancing investment into the existing federated cloud project to diversify the supply of data infrastructure services. It also needs to ensure coordination of proposed and existing initiatives in this area, such as Gaia-X, and aim to minimize the compliance costs of new regulatory measures that could disadvantage European cloud providers. It is worth noting that within the EU policy debate there is a strong focus on protecting data from foreign jurisdictions and weaker data- protection regimes, often mentioned in the context of data localization efforts that force data to be hosted in the jurisdiction it was collected.¹¹⁵ Therefore, this chapter should be read in conjunction with chapters 2.2 on simplifying data transfers and with chapter 3.1 on enhancing privacy and trust.

EU policy context

The European Commission, supported by most EU member states, aims to reduce dependencies on external cloud providers by setting new regulations and funding “federated cloud infrastructure.”¹¹⁶ Building on the data strategy proposed by the commission, twenty-five EU member states signed a joint declaration in October 2020 pledging to invest up to €10 billion in the cloud sector and establish the “European Alliance on Industrial Data, Edge and Cloud.”¹¹⁷ To shape these efforts, twenty-seven CEOs of European companies shared priority areas for cloud and edge ecosystem development and investment with European Commissioner for Internal Market Thierry Breton in May 2021.¹¹⁸

In July 2021, the commission officially launched the European Alliance on Industrial Data, Edge and Cloud, which aims to combine resources from existing EU programs, industry and member states, for the creation of a European Federated Cloud in 2021.¹¹⁹ The alliance plans to write a Cloud rulebook that would provide a single European framework of cloud use and interoperability requirements, and it has posed the idea of a European marketplace for cloud offerings: a single portal to cloud services meeting key EU standards and rules.¹²⁰ There also is a strong focus on so-called edge computing, which instead of relying on centralized infrastructure, allows for real-time data sharing and analysis directly between devices. In addition to this, the 2020 Digital Markets Act proposed by the commission aims to mitigate the risk of dependencies on cloud providers through stricter portability and data access obligations.¹²¹

It is yet unclear whether the European Alliance on Industrial Data, Edge and Cloud will be able to coordinate the efforts needed to compete with the leading non-European cloud providers. There is also a question of how specifically the Alliance relates to other initiatives, such as the Franco-German Gaia-X cloud network. Gaia-X aims to serve as a platform and Europe-wide network joining up cloud services providers, high performance computing, and edge system providers from a number of EU-based and international organizations.¹²² It is crucial that these policy and funding initiatives are coherent and that potential overlaps are minimized to enable data flows while maintaining adequate data protections.¹²³

Geopolitical considerations

The concentration of the EU’s cloud services among few US suppliers makes it vulnerable to external supply disruptions. According to Atlantic Council research, some 92 percent of the Western world’s data are stored in the United States.¹²⁴ Just three US providers—Amazon, Microsoft, and Google—account for 66 percent of the European cloud market.¹²⁵ EU-based cloud providers have seen their share of the European market decline from 26 percent in 2017 to 16 percent in 2020 (see figure 3) and the largest EU-based provider—Deutsche Telekom— accounts for only 2 percent of the market.¹²⁶

At the same time, the uptake of cloud among EU businesses has been on the rise, which highlights the urgency of addressing the issue.¹²⁷ As cloud services become part of critical infrastructure, they become increasingly vulnerable to disruption.¹²⁸ For example, a single cloud provider’s supply chain decisions and internal security policies can impact millions of customers.¹²⁹ As the United States has previously threatened to cut European companies off from critical US services for engaging in business activities condoned by European countries,¹³⁰ the growing dependence on a few US providers is increasingly seen as a strategic weakness that could undermine European foreign policies and national interests.¹³¹

Europeans are also concerned about foreign authorities potentially accessing their data and risks to data protection. The emergence of foreign entities holding large troves of data on individuals is a growing concern for Europeans who were alarmed by Edward Snowden’s disclosure of electronic mass surveillance by the US government.¹³² These concerns have intensified since the adoption of the US Clarifying Lawful Overseas Use of Data Act (CLOUD Act) that can require US tech companies to hand data to US authorities, even if it is stored outside the United States.¹³³ For example, in April 2019, Germany’s top data protection office told Politico that Amazon’s cloud hosting services were not suitable for storing German police data due to the risks that US authorities would access the data.¹³⁴ In May 2021, the EU privacy watchdog, the European Data Protection Supervisor, launched an investigation looking into compliance of EU bodies’ using of cloud computing services provided by Amazon and Microsoft due to concerns about the transfer of personal data to the United States.¹³⁵ In addition to this, the EU is concerned about cloud service providers being subject to legislation that would contradict the EU’s General Data Protection Regulation and therefore weaken existing data protection enforcement, in particular in the context of concerns about several Chinese cybersecurity and national intelligence laws (see chapter 2.2).¹³⁶

However, the pursuit of strategic autonomy in cloud infrastructure and related data localization requirements have important limitations. Some EU member states such as France and Germany have already introduced data localization requirements that restrict the storage or transmission of certain data outside their borders.¹³⁷ However, as Atlantic Council research by Trey Herr indicates, data localization policies, especially when they do not adequately capture the different types of data managed by cloud operators, can undermine the public cloud’s economic and technical underpinnings.¹³⁸ The public cloud—where a service provider makes its resources available to the public, enabling organizations and individuals to leverage large volumes of data storage and cutting-edge computing services at lower costs—is made possible by large numbers of global users accessing international networks of storage and computing resources.¹³⁹ Therefore, when data cannot cross borders, much of the flexibility at the core of the public cloud model can be lost.¹⁴⁰ Data localization requirements can force cloud service providers to build infrastructure to serve specific markets, potentially leading to costly changes in core infrastructure design.¹⁴¹ For example, the push for data localization has led to a different model of cloud services provided by Microsoft Azure in Germany.¹⁴² In 2015, Microsoft announced that it would open a cloud offering that would be run through a data trustee, which would control the data stored in Germany and Microsoft’s access to the data.¹⁴³ However, this offering wasn’t sustainable, according to reports, due to higher costs and separation from international business, which highlighted the challenges posed by data localization rules.¹⁴⁴ See chapter 2.2 for more analysis on data transfers and the impact of data localization rules.

National considerations

The EU member states want to address dependency on non-European data infrastructure but are divided on how far to go in pursuit of strategic autonomy.¹⁴⁵ Some countries see both a risk to autonomy and privacy as well as an opportunity to advance European industry and help foster national “champions.”¹⁴⁶ For example, the German and French led Gaia-X project aims to provide a platform to combine cloud-hosting services from several companies. It plans to achieve this by “allowing businesses to move their data freely with all information protected under Europe’s tough data processing rules,”¹⁴⁷ and by doing so, aims to strengthen European digital sovereignty by decreasing dependencies on external digital infrastructure suppliers.¹⁴⁸ Yet some experts argue that it is not clear if the EU has sufficient domestic suppliers of cloud services to substantively replace foreign suppliers to the EU market.¹⁴⁹

However, some EU member states are wary of potential protectionist measures that threaten Europe’s openness to cloud services and favor larger member states.¹⁵⁰ In March 2020, Finland led a coalition of fifteen EU member states that published a joined statement asserting that, “to avoid distortions in the single market and to ensure a global level playing field, operators from third countries must compete under similar conditions and rules as their European counterparts.”¹⁵¹ This is potentially also because some member states are benefitting from US investment. For instance, in 2020 Poland saw major cloud infrastructure investments of up to $3 billion (combined) from Microsoft and Google.¹⁵² Additionally, according to Politico, some European states are concerned that a push for greater European digital autonomy is going to provide the Franco-German industry a new edge at the expense of smaller economies.¹⁵³ In 2020, Finnish Minister for European Affairs Tytti Tuppurainen said: “There is a very strong tendency toward ever-growing protectionism.”¹⁵⁴ The minister said she fears “smaller economies might be in danger of being trampled on.”¹⁵⁵

Business considerations

Businesses need affordable data infrastructure that meets their need for fast analysis and interpretation of data. Cloud services can drive machine learning implementation because they provide organizations with access to high-performance infrastructure that they could not afford on their own.¹⁵⁶ For example, the rise of autonomous vehicles is fueled by the use of cloud service providers as they require large data storage, computing power, and real-time accessibility, with data traffic from autonomous vehicles predicted to increase a thousandfold between 2018 and 2025.¹⁵⁷ Edge computing infrastructure and standards also will be increasingly important for the deployment of IoT systems, for example in smart agriculture. When smart devices can share and process data more directly through edge computing—like when a sensor monitoring crops analyzes air moisture and directly triggers a watering system, rather than doing so via a server located hundreds of miles away—it can reduce the volume of data traffic needed to be transferred long distances, lowering costs, and enabling better service in areas with lower connectivity.¹⁵⁸ At the same time, edge computing presents security risks that require additional safeguards as this decentralized model expands the potential attack surface, making data harder to protect.¹⁵⁹

The customers of cloud providers reap the benefits of economies of scale. The ability to access global networks of data storage and processing through public cloud offerings enables organizations to leverage large volumes of data storage and cutting-edge computing services at lower costs compared to in-house data centers.¹⁶⁰ However, this also leads to strong dependency on cloud services, which can create vulnerabilities for businesses.¹⁶¹ For example, difficulties and costs of switching can exacerbate vendor lock-in, which could potentially lead to businesses having fewer and lower-quality service choices.¹⁶²

Recommendations for the European Commission

A. Enhance the implementation of the European Alliance for Industrial Data, Edge and Cloud by seeking additional investments from the participating cloud computing providers and industrial cloud users.

While the project is still at its early stages, it would diversify the supply of cloud services available to European organizations, mitigating risks stemming from over-reliance on a small number of non-European providers. While the project received political support from the member states, the commission should seek additional incentives to enhance European industry players’ involvement and financial contributions in the Federated Cloud project.

B. Enhance coordination between the European Alliance for Industrial Data, Edge and Cloud with parallel initiatives, such as Gaia-X, and ensure that the European
Cloud Rulebook and other proposals do not impose significant regulatory burdens on cloud service providers that could impact the competitiveness of their offerings.

As argued by Andrea Renda and others from the Centre for European Policy Studies (CEPS), the commission should aim to maximize coordination with existing parallel projects, such as Gaia-X to minimize overlaps in rules and norms that would prevent sufficient interoperability, openness, and transparency of the EU data economy.¹⁶³ While there are not many details yet available regarding the Cloud Rulebook and marketplace proposals mentioned in the EU Data Strategy, the European Commission needs to ensure that the potential regulatory burden will not severely impact the ability of European providers to provide competitive price and service offerings.

2.2 Simplify data transfers

Training algorithms to automate tasks, make predictions, and develop new products and services often requires businesses to share data with machine learning specialists or to combine data sets across multiple locations.¹⁶⁴ Yet only 6 percent of European businesses are using data in this way.¹⁶⁵ With cross-border data flows now contributing more to global gross domestic product (GDP) than the flow of goods,¹⁶⁶ the EU risks being excluded from global value chains due to its relatively slow uptake of data transfers.

To address this, the European Commission should develop new mechanisms to enable EU-based businesses to participate in global data value chains and address legal ambiguity in EU legislation that increase risks and costs for businesses sharing data. One important issue related to data transfers in the EU is data protection regulations to safeguard the fundamental rights of European citizens. This chapter builds on chapter 2.1 and focuses on identifying barriers to data transfers and ways to address them. It is not intended to downplay the importance of maintaining protections, particularly for personal data, and should be read in close conjunction with chapter 3.1 on enhancing privacy and trust.

EU policy context

Within the EU, aspects of European legal frameworks lack clarity on data transfer issues.¹⁶⁷ While the European Commission has funded the Support Center for Data Sharing to help organizations overcome legal barriers,¹⁶⁸ there is a need for greater clarity in relevant legislation.¹⁶⁹ The proposed Data Governance Act introduces a legal framework for “data sharing services” aimed at increasing trust and reducing costs to data sharing,¹⁷⁰ however the current proposal does not address the legal uncertainties that give rise to those costs.¹⁷¹

The Schrems II judgment in the EU Court of Justice invalidated widely used practices for sharing data outside the EU. The GDPR required that personal data transferred outside the EU must not undermine the protection set by the regulation.¹⁷² This condition can be met by a European Commission “adequacy decision” determining that a country provides an adequate level of data protection.W¹⁷³ However, the July 2020 Schrems II judgment invalided such an “adequacy decision” negotiated with the United States under the EU-US Privacy Shield—an agreement used by more than 5,300 companies.¹⁷⁴ The judgment also brought into question the transfer of data under standard contractual clauses, which are commonly used data-protection obligations.¹⁷⁵ As a result, such clauses are subject to onerous new safeguards that can be impractical for data exporters.¹⁷⁶ This has heavily impacted businesses that transfer data, with about 85 percent of EU-based businesses that share data previously using such clauses to transfer data outside the EU.¹⁷⁷ A study commissioned by the European Parliament indicates that renewing the adequacy decision with the United States will likely require fundamental changes to the US data privacy and surveillance regimes.¹⁷⁸

Geopolitical considerations

Cross-border data transfers can expose sensitive data to weaker data protection regimes and make it easier for foreign governments to surveil and influence European citizens. Further to Chapter 2.1, which includes a focus on the risks of surveillance through foreign-owned data infrastructure, data transferred to a non-EU jurisdiction can be subject to foreign surveillance and manipulation. (These risks are in addition to those created by relying on foreign-owned cloud services, which become part of critical infrastructure, and are vulnerable to disruption.) For example, in 2008 the US Congress amended the Foreign Intelligence Surveillance Act to allow US intelligence agencies to conduct surveillance of non-US persons located abroad without individual warrants under Section 702.¹⁷⁹ Once general targeting is authorized by the Foreign Intelligence Surveillance Court, the US government can require electronic communication service providers, like email, telephone, and remote computer service providers, to provide information and assistance, including information in US-based facilities such as data centers.¹⁸⁰ China has put in place similar foreign intelligence surveillance laws, such as the 2017 National Intelligence Law which obliges individuals, organizations, and institutions to assist national security agencies in “intelligence” work.¹⁸¹ Beyond surveillance, inadequate data protections deployed by businesses outside the EU have allowed bad actors, like Russia, to unleash disinformation campaigns and influence elections in Europe.¹⁸² These instances are among some of the drivers of EU policies to implement legal restrictions on cross-border data transfers. They also serve to support calls for protectionist data localization policies intended to shelter domestic industry from international competition.¹⁸³

However, flows of data outside the EU are critical for global collaboration, economic competitiveness, and innovation. The Digital Europe survey found that of the EU-based businesses transferring data to another nation, 94 percent of them send data to the United States, 56 percent to the United Kingdom, and 59 percent to an Asian nation.¹⁸⁴ These data transfers underpin essential business functions and are a driving force behind global- value chains—where business operations are divided across countries for efficiency, lower costs, and faster production.¹⁸⁵ As cross-border data flows increasingly become an integral part of the global economy,¹⁸⁶ countries that participate in a greater share of these value chains will see more economic benefits from the digital economy.¹⁸⁷ While the commission still has in place adequacy agreements with fourteen countries,¹⁸⁸ surveys (conducted before the June 2021 agreement with the United Kingdom) indicate these agreements are only used by 5 percent of EU businesses transferring data.¹⁸⁹ To secure its place in future global value chains, the EU needs mechanisms in place that seamlessly facilitate data transfers with global innovation hubs, particularly the United States.

National considerations

EU member states that are more concerned about security and data privacy are in favor of restricting cross-border data flows. As detailed in chapter 2.1, there are calls in some EU member states for data localization rules that restrict the storage or transmission of certain data outside their borders. The proposal for France’s Digital Republic Bill (2016) included a provision to completely restrict data transfers outside the EU.¹⁹⁰ Although the provision was later removed, it is representative of the desire among many French and European leaders to defend Europe’s “digital sovereignty.” This includes both the protection of data from foreign governments and businesses, as well as maintaining the ability for European law enforcement to access European data when needed.¹⁹¹ For example, European law enforcement agencies have been given some leeway under the 2020 Quadrature du Net and Others decision of the Court of Justice of the European Union to intercept and retain bulk data indiscriminately when faced with “serious threats to national security,” such as terrorist activities.¹⁹²

But smaller, more open and trade-dependent member states want fewer barriers to cross-border data flows. What had been dubbed the Digital Nine (D9) group of EU member states wants the EU to be more active in pursuing more open cross-border data flows.¹⁹³ In a 2017 letter addressed to the Vice President of the European Commission, the group urged the EU to include ambitious rules on open data flows in trade agreements, arguing that the absence of those rules “limits [their] countries, citizens, and entrepreneurs from participating in global value chains.”¹⁹⁴ In March 2021, a similar group of eight smaller EU member states co-signed a letter underlining the need to “eliminate barriers to cross-border online services, and to ensure free data flows.”¹⁹⁵ However, the European Data Protection Supervisor has emphasized in relation to negotiations with the United Kingdom (UK), that data protection is nonnegotiable in international trade agreements, which must preserve individuals’ fundamental rights to data protection and privacy.¹⁹⁶ While the commission did reach an adequacy decision with the UK, for the first time it included a sunset clause which means the decision will automatically expire in four years.¹⁹⁷

Business considerations

The lack of legal clarity on data ownership, liability, and reuse creates costs and risks to businesses sharing data. Companies rely on contracts when sharing data. For example, about 60 percent of the EU-based businesses that share data do so with subcontracted data analytics services.¹⁹⁸ However, legal uncertainties mean data-sharing contracts often need to be tailored to address specific issues and examined on a case-by-case basis,¹⁹⁹ requiring complex and expensive negotiations. Liability issues are one example, given the risks when sharing data with a third-party whose misuse of the data could result in damages.²⁰⁰ Similarly, selling a product or service based on someone else’s data could lead to damages and harm if the data is incorrect or significantly biased (see chapter 3.2 for more on bias and data-driven discrimination).

Transferring data outside the EU is even more complex, costly, and risky. While the Schrems II judgment applies to personal data, most data sets contain both personal and non-personal data,²⁰¹ and it is often impractical, if not impossible, to split mixed data sets.²⁰² This means that the GDPR and the Schrems II judgment apply to a large proportion of data sets in the EU. Following the judgment, Digital Europe conducted a business survey that showed that 91 percent of EU-based businesses that share data do so with entities outside the EU and 90 percent reported facing moderate-to-high costs in reassessing their data-sharing contracts to comply with the judgment.²⁰³ These costs mean smaller businesses are more likely to avoid cross-border data transfers at all.²⁰⁴

Recommendations for the European Commission

A. Give greater legal clarity on data ownership and on who is responsible for damages that occur due to externally provided data.²⁰⁵ This includes in relation to:

i. The General Product Safety Directive not containing any provisions on the liability of damages caused by data where products are based on third-party data providers, which creates a barrier for the development of such business models.²⁰⁶

ii. The Product Liability Directive’s ambiguous scope in relation to digital products and defects resulting from errors in externally provided data.²⁰⁷

iii. The need for legal instruments that define data ownership and data access/use rights for nonpersonal data (see recommendation B in chapter 2.3).

Clarifying liability and ownership issues in EU-wide legal frameworks reduces the need for these issues to be addressed in contractual arrangements when businesses share data and lowers the uncertainty and risk in legal proceedings. This work would particularly support innovative business models among small and medium-sized enterprises (SMEs) that cannot afford the potential legal costs of these business models under the current framework.²⁰⁸ This also would help support a high-growth sector in the EU that currently feels neglected by policy makers in Brussels.²⁰⁹

B. Prioritize negotiations on the recognition of data protection adequacy with global innovation hubs that share European values on data protection to increase the competitiveness of EU businesses in global value chains. This should involve exploring solutions with the United States to address the risks highlighted in Schrems II, while enabling cross-border economic activity.²¹⁰

Data transfers with the United States and other innovation hubs are embedded in the value chains of many EU businesses, and value chains globally. Establishing mechanism to seamlessly transfer data with adequate protections would strengthen the position of European companies in the global economy. Adequacy determinations are also often preceded by substantial discussions with the third country, which in some cases have resulted in changes to data protection practices or even legislation in third countries to reflect European standards more closely.²¹¹ This adoption of European standards increases the odds that emerging global digital norms more closely align with European systems and values.

Finding ways to reconcile EU data protection and privacy requirements with US surveillance laws and its lack of a national data privacy law will continue to pose significant challenges to an adequacy decision and likely require significant changes to US data privacy and surveillance regimes.²¹² That said, there are ideas being put forward that could potentially offer solutions to the impasse, such as the hybrid model offered by Rubinstein and Margulies, which is based on US export control law.²¹³

2.3 Foster inclusive data ecosystems

Like public-sector data (see chapter 1.1), making private data sets more widely available allows other organizations and individuals to leverage them for machine learning. However, a 2018 study by Deloitte found that European companies often cannot access the data they need from other companies, and when they do, face strict contractual limitations on using it.²¹⁴ It also found that smaller companies and those in a weaker position in the value chain had unequal bargaining power when seeking to access data.²¹⁵ This lack of data access can put businesses at a disadvantage in the market, stop them from developing new products and services, and slow productivity and efficiency growth.²¹⁶ While the problem varies across sectors, the lack of competition linked to the unwillingness of companies to share data—alongside other impediments to data access—has negative implications for the EU’s digital competitiveness and innovation, freedom of choice for consumers, and digital inclusion.²¹⁷ The EU needs to encourage data sharing and remove anticompetitive data barriers while maintaining incentives for businesses to invest in data.

To achieve this, the European Commission should adopt sector-specific data portability rights and introduce rules on when businesses should allow access to proprietary data sets to promote competition in European data ecosystems. The commission should also establish sector-specific consultative mechanisms to review these as technologies and market structures change over time.

This chapter should be read in conjunction with chapter 3.1 on privacy and trust for its greater focus on protecting the rights of individuals, which is fundamental when sharing data between organizations.

EU policy context

The commission has been developing ways to incentivize businesses to make data more widely available. This includes data portability rights for customers—the ability to extract your data from one service and use it for another—such as the 2016 General Data Protection Regulation, which gives individuals the right to port personal data.²¹⁸ The commission is also implementing sector-specific rules on access to proprietary data sets—the right for a business to access specific data held by others—through the proposed Digital Markets Act, which puts obligations on large platforms to give businesses real-time access to certain data related to the businesses’ products and services.²¹⁹ The commission has further foreshadowed sectorial legislation to create “common data spaces” that encourage, and possibly even force, entities to make their data more widely available.²²⁰ It is also preparing a Data Act, which aims to create a “fair data economy,” including by clarifying rights related to nonpersonal data.²²¹

Geopolitical considerations

Major powers want access to data that helps their industries compete internationally. European leaders are concerned that the strategic acquisition of companies at the center of data ecosystems could be leveraged for competitive advantage over EU-based industries.²²² For example, China’s “Made in China 2025” policy to develop high-tech manufacturing encouraged Chinese firms to strategically acquire foreign companies with advanced technologies and data.²²³ In a conversation with digital policy expert Jörn Fleck from the Atlantic Council, he observed that European leaders may fear a repeat of the mid-2000s when the EU “missed the boat” on digital platforms, resulting in US firms dominating this space; however, he noted that this time it would be worse. With traditional industries like manufacturing turning to machine learning to boost productivity, a lack of control over data in the value chain could weaken Europe’s industrial base, he added.

Policies designed to redistribute data in favor of European firms can pose risks to the EU. Comments by European Internal Markets Commissioner Thierry Breton—“ European data will be used for European companies in priority, for us to create value in Europe”—suggest plans to create data ecosystems that treat European companies more favorably.²²⁴ However, in a conversation with Andrea Renda, a digital economy expert at CEPS, he warned that while Europe’s desire to develop its own solutions is legitimate and desirable, policies designed to insulate European tech companies from outside competition would weaken the EU’s competitiveness in the digital economy, and consequently, the EU’s share of the global economy as it becomes increasingly digitized. Furthermore, such policies could invite retaliation from trading partners. Although unrelated to data, US threats to impose tariffs if EU member states enact a digital services tax on platforms such as Facebook, Google, and, Amazon demonstrate its readiness to impose costs on EU businesses when the interests of its big tech companies are harmed.²²⁵

National considerations

EU rules encouraging or obliging businesses to share data need to consider the varying positions of member states. France has called on regulation that encourages platforms to share data to be “agile and flexible” to adapt to changing business models, while Germany seeks stronger rules to prevent anticompetitive behavior before it occurs.²²⁶This includes Germany’s recent amendments to its national competition law to prevent dominant companies from using data to make market entry more difficult for other companies and to stop them from hampering data portability and interoperability.²²⁷ Poland broadly agrees with the German approach, though it emphasizes the need for only a narrow catalog of black-listed practices, covering only those most harmful to competition that should be determined by member states.²²⁸ Finland stresses that any new rules obliging companies to share data should be “justified, proportionate, and clear,” avoid “unnecessary administrative burden,” and provide legal protection for platforms that control data.²²⁹

Business considerations

Incumbent businesses seek to protect their data to give them a competitive advantage. Research shows that many incumbents in the market are not willing to share data due to a fear of new competition.²³⁰ One reason for this is the ability to maintain a dominant position in the market through data network effects and machine learning.²³¹ For example, as technology analyst Benedict Evans explains,²³² autonomous vehicle (AV) manufacturers can use exclusive access to data to outperform competitors. This is because AVs capture images as they drive, comparing them with prebuilt 3D models of the vehicles’ surroundings to understand points of difference using machine learning. As the vehicle drives, it is updating the model so that the next time it drives down the same road, there are fewer points of difference for it to process and hence a lower chance of it becoming confused and reacting dangerously. When this data is shared between vehicles, each one is able to drive more safely as the models are updated more frequently. By sharing this mapping data, AV manufacturers can help all AVs on the road to perform better and safer for the public. However, companies in a dominant position with more cars on the road are incentivized to withhold the data from competitors to distinguish themselves by offering a superior product.²³³

Data sharing obligations help create data ecosystems but can also have unintentional consequences. Data portability rights can create opportunities for upstream businesses and value for consumers including for vehicles repair services,²³⁴ and pay-how-you-drive insurance.²³⁵ Supplier access to in-vehicle data can also help downstream businesses such as in-built car component manufacturers improve their products to develop innovations like machine learning systems that forecast parts failures and recommend repairs.²³⁶ However, competition experts warn that excessive obligations to share data might create disincentives for companies to invest in high-quality data sets.²³⁷ For instance, there may be fewer opportunities to make a return on the investment,²³⁸ or it could result in anticompetitive information exchange that puts them at a disadvantage.²³⁹ The 2018 study by Deloitte found that these issues are unique to each sector due to the different business models, public interests, and economic sensitivities.²⁴⁰ The case outlined above is one example where the public could potentially be better served by autonomous vehicle companies sharing mapping data to improve the safety of all vehicles on the road; however, in other cases, sharing data could disincentivize innovation or benefit certain consumers at the expense of others,²⁴¹ with little or no public benefit. The commission needs to carefully calibrate measures sector-by-sector to avoid outcomes that disproportionally impact certain groups, create barriers to competition, or hamper innovation.

Recommendations for the European Commission

In forthcoming sectorial legislation to create common data spaces in manufacturing, environment protection, mobility, health, financial, energy, agriculture, public administration, and skills,²⁴² the commission should:

A. Adopt sector-specific data portability rights and standards for customer data (personal and nonpersonal) that consider the distinct data-access problems at an
industry level to balance the benefits and costs in different contexts.

The commission could draw lessons from the Australian Consumer Data Right, which is a series of sector-specific legislation giving consumers and small businesses control over their personal and nonpersonal consumer data in phases, starting with banking (2019), energy (2020), and telecommunications (forthcoming).²⁴³

B. Adopt sector-specific data-access obligations for platforms and dominant companies to share data with other businesses that consider the business models, public interest, and economic sensitivities of each sector.

This would limit companies from withholding access to data needed by other companies to innovate and compete, without unfairly discriminating against incumbent players.

C. Establish sector-specific consultative mechanisms involving industry, the proposed European Data Innovation Board, national standards organizations, and competition authorities to periodically review the suitability of sectoral data-access rules as technologies and market structures change.

For example, the Japanese Fair Trade Commission has created the Study Group on Competition Policy in Digital Markets, which includes experts from government, industry, and academia and meets monthly to investigate issues related to Japan’s Antimonopoly Act.²⁴⁴

Implementing sector-specific portability and data-access rules would help balance European economic autonomy and openness by enabling a greater diversity of suppliers, particularly in downstream services, making Europe more resilient against geographic supply disruptions, and increasing opportunities for SMEs to participate in the value chain. Periodically reviewing these rules mitigates against “set and forget” policies that over time risk disincentivizing innovation and disadvantaging certain players owing to the changing nature of technology and digital markets.

Part three: Protect fundamental rights

The European Commission anchors its approach to AI policies around the need to ensure the protection of fundamental rights while boosting research and industrial capacity.²⁴⁵ Data rules for machine learning have the potential to directly infringe on citizens’ private lives and the right to the protection of personal data (Articles 7 and 8 of the EU Charter of Fundamental Rights) and on nondiscrimination and equality between women and men (Articles 21 and 23).²⁴⁶ Therefore, Chapters 3.1 and 3.2 focus on issues related to risks to privacy and discrimination enabled by data for machine learning.²⁴⁷

3.1 Enhance privacy and trust

To enable sustainable development and adoption of machine learning, the EU needs to reconcile privacy considerations with the benefits of access and reuse of data. The costs and legal difficulties of meeting strict privacy restrictions reduces the amount of data available to businesses and researchers for machine learning and can stymie European innovation.²⁴⁸ At the same time, EU decision makers face important societal considerations as many Europeans lack confidence in organizations’ and governments’ ability and willingness to protect their data, despite the fact that the EU has one of the world’s strictest data privacy regulations—the GDPR.²⁴⁹ Failure to protect citizens’ fundamental rights, including privacy, while enabling wider data use could further undermine citizens’ trust, European innovation and international competitiveness, and, consequently, the credibility of the EU’s global values-
based leadership.

To address this, the European Commission should clarify guidance on applying the GDPR to machine learning techniques to diminish the costs of compliance and legal uncertainty while enhancing data-protection enforcement. The commission should also invest in emerging privacy-preserving machine learning techniques that can still produce accurate machine learning outputs while reducing the reliance of organizations on large, centralized sets of personal data to innovate. It should enhance research collaboration with like-minded countries in developing these techniques.²⁵⁰

EU policy context

The EU is struggling to balance implementing strict data protection with enabling data-driven innovation. The GDPR harmonized data-privacy laws across EU member states and overhauled how organizations process and handle data. It introduced limits to the full deployment of machine learning, for example by putting restrictions on certain automated decisions.²⁵¹ At the same time, the 2020 Data Governance Act (DGA) proposal aims to foster the availability of data for use and data-sharing mechanisms across the EU.²⁵² For example, the act seems to imply that public-sector bodies will be obliged to fulfill requests for access to personal data that has been anonymized.²⁵³ However, this has been criticized by digital-rights organizations as anonymization techniques may not be sufficient to guarantee that data will not be traced back and reidentified.²⁵⁴ These two policy examples show the difficulty of balancing the complex and sometimes competing considerations of privacy and innovation.

Geopolotical considerations

Easier access to personal data gives countries such as China a data advantage in machine learning. Technology and security experts are concerned by the apparent systemic advantage political regimes like China’s have in machine learning.²⁵⁵ This includes the Chinese Communist Party’s ability to access citizens’ data without consent and its motivation to invest heavily in machine learning that enhances surveillance and social control, which has enabled the rapid development of artificial intelligence.²⁵⁶ While China’s Personal Information Protection Law (which comes into effect in November 2021) is modeled on GDPR,²⁵⁷ experts note that China’s approach to data-protection enforcement has focused more on investigating foreign companies and accessing their data than preventing the unauthorized transfer and use of personal data, which is widespread in China.²⁵⁸

Democracies such as EU member states face an existential challenge of protecting European fundamental rights while developing better performing AI systems based on larger and more diverse data sets.²⁵⁹ Ethics and artificial intelligence expert Tim Hwang argues that “democracies may be in the unenviable position of having to compromise on core values like privacy to preserve their technological lead.”²⁶⁰ To overcome that challenge, Hwang suggests that democracies should invest in a set of technological solutions that would reduce or replace the reliance of machine learning systems on large data sets.²⁶¹

Privacy-preserving machine learning could become the EU’s competitive advantage. Differences in the emphasis on an individual’s right to privacy and control of personal data is one of the key issues highlighting divergence in philosophical and regulatory approaches among key global actors such as the United States, China, and the EU.²⁶² The EU sees its emphasis on privacy, transparency, and fairness as its competitive advantage when technologies can increasingly undermine human rights.²⁶³ The GDPR is a central element to this because it sets the EU apart from the United States, which still lacks any federal data-privacy legislation.²⁶⁴ However, senior US officials have recently highlighted the importance of data security and privacy for the competitiveness of the United States, including specific mention of the potential of technologies like privacy preserving machine learning (PPML) by President Biden’s National Security Advisor Jake Sullivan.²⁶⁵ This may open opportunities for the EU to cooperate with the United States in accelerating the development and deployment of these technologies. PPML techniques such as federated learning could make it possible for EU-based companies to benefit from large and diverse data pools without compromising privacy (chapter 1.1 provides further details on PPML techniques).²⁶⁶

National considerations

Citizens of EU member states expect government to ensure The protection of their personal data and some populations are less willing to share their data than others. Europeans share a high level of concern about the security of their personal data.²⁶⁷ Approximately half of the respondents in a recent Munich Security Conference survey think that their national government is not acting aggressively enough on a range of security and privacy issues, such as how technology companies protect data.²⁶⁸ Only about one-quarter of respondents had confidence in companies from other EU member states protecting their data, less than one-fifth had confidence in US companies and one-tenth in Chinese companies (see figure 4).²⁶⁹ “A 2019 survey showed that on average people in Finland were more likely to share their personal data to improve public services like medical research and care (66 percent), people in Germany and France were less likely (44 percent and 40 percent), and people in Poland unlikely (30 percent).²⁷⁰ The European Commission needs to consider how to close this gap when developing policies to enable machine learning across the EU.

However, some member states are also concerned that overregulation has the potential to stifle innovation. In October 2020, Denmark drafted a document co-signed by France, Finland, and Poland alongside eleven other EU member states advocating to “ensure that the European Commission sees innovative artificial intelligence as equally important as managing risks.”²⁷¹ Therefore, the commission needs to carefully balance measures that will increase trust in privacy protection as well as enhance the competitiveness of European businesses.

Business considerations

Large volumes of training data, which often include personal data, are essential for the development of machine learning systems. For example, to launch safe autonomous vehicles, companies rely on vast amounts of training data to teach and test vehicles on how to navigate various complex driving situations safely.²⁷² Training data availability is therefore at the core of the advancement of autonomous vehicles companies’ programs.²⁷³ Since those vehicles interact with many stakeholders, such as car owners and passengers, pedestrians, or other drivers, they also generate unprecedented volumes of personal data of which GDPR restricts usage.²⁷⁴

Businesses can face high GDPR compliance costs or legal uncertainty that can deter the development of machine learning. Early evidence from the GDPR suggests that for some businesses, compliance costs can be significant and the rules complex to implement.²⁷⁵ For example, the GDPR requires consent from the subject before collecting or using personal data, but this becomes extremely complex to implement in cases like autonomous vehicles where there are a high number of stakeholders and large volumes and categories of data being collected.²⁷⁶ Potential steep fines also can make businesses hesitant about using and sharing personal data, even if adequate protections are in place. The European Commission should introduce additional measures to diminish the costs of compliance and legal uncertainty while enhancing data protection enforcement.

Recommendations for the European Commission

A. Coordinate with the European Data Protection Board and national data protection authorities to provide more detailed guidance and assistance on implementing the GDPR to machine learning applications to enhance compliance and mitigate business costs.

This could include work on establishing potential use cases, guidelines, and technical standards,²⁷⁷ which would decrease the costs of compliance, risks from legal uncertainty, and allay concerns that stem from the possibility of high fines.²⁷⁸

B. Enhance investment in research in privacy-preserving machine learning techniques that can adequately protect personal data while enabling machine learning development, including in partnership with likeminded countries such as the United States.

The development of techniques that can viably replace the need to compile large pools of personal data for machine learning could enable the EU to better innovate using machine learning while upholding liberal democratic values.²⁷⁹ The commission should consider enhancing its cooperation with the United States in developing these techniques. For example, the commission could propose incorporating them in one of the working groups of the EU-US Trade and Technology Council, which was launched in June 2021.²⁸⁰ It could also expand the scientific cooperation between the European Commission’s Joint Research Centre (JRC) and the US National Institute of Standards and Technology (NIST).

3.2 Mitigate data-driven discrimination

Machine learning systems leverage training data sets to observe patterns and automate prediction and decision-making. However, without safeguards, machine learning may reproduce and scale existing discriminatory systems as well as biases that can violate human rights.²⁸¹ While there are several ways this can happen, data is one of the key contributing factors leading to discriminatory machine learning predictions and outcomes.²⁸² Currently, there are gaps in the EU’s legal and technical measures available to address the issue, which could lead to societal and reputational costs for the EU, member states, and organizations. The European Commission needs to address the gaps in current rules and guidelines on machine learning-enabled discrimination while avoiding negatively impacting the competitiveness of EU businesses and the EU’s potential to shape global AI standards.

To address this, the European Commission should clarify the distribution of responsibilities between the enforcement authorities of existing rules and newly emerging authorities proposed in upcoming AI regulation to avoid overlaps or loopholes in implementation. The commission should provide sufficient funding across the member states for data protection authorities and other relevant bodies that enforce the existing rules to minimize gaps in implementation. It should also increase cooperation on reducing discrimination in machine learning with the United States.

EU policy context

Existing data protection and nondiscrimination legislation provides a robust framework but fails to fully capture the problems emerging from discrimination enabled by machine learning. For example, while the GDPR helps mitigate unfair and illegal discrimination based on personal data, it does not address those risks in nonpersonal data.²⁸³ The GDPR introduced a mandatory Data Protection Impact Assessment (DPIA) to examine the likely impact of data processing on individuals’ rights and fundamental freedoms, but it only applies in certain “high-risk” cases, for example when organizations conduct profiling of individuals.²⁸⁴ The currently available legal tools to deter discrimination are also not always easily transferrable to machine learning, for example, because discriminatory outputs can be hard to detect.²⁸⁵ The European Commission has also provided voluntary “Trustworthy AI assessment tools” to fill this gap.²⁸⁶ However, there remains a lack of oversight mechanisms in place to incentivize compliance.²⁸⁷

The new EU AI regulation proposed by the European Commission in April 2021 outlines data-governance and -management requirements to protect fundamental rights but lacks clarity on implementation.²⁸⁸ The proposal aims to complement existing nondiscrimination laws with additional requirements to minimize the risk of algorithmic discrimination, especially in relation to the design and the quality of data sets used for the development of AI systems. It includes requirements for training, validation, and testing data sets to be “sufficiently relevant, representative and free of errors, and complete.”²⁸⁹ Article 10 specifies the “quality criteria” for data sets used for the development of “high-risk” AI systems, such as remote biometric identification, including design choices, data collection, processing (e.g., labeling and cleaning), assessment of the data sets, examination of possible biases, or identification of any possible data gaps or shortcomings.²⁹⁰ While providing data requirements to mitigate potential for discrimination is a step in the right direction, the requirements appear to lack specificity and clarity on how they would be implemented, potentially relying on self-assessment of AI system providers, whose interest primarily rests with ensuring compliance and who may lack necessary expertise.²⁹¹

Furthermore, the proposed measures to reduce discrimination may have limited effect and will require significant investment in human resources and expertise by member states as well as coordination among European regulatory bodies. Technology and law experts Kenneth Propp and Mark MacCarthy argue that the proposal’s specific mechanism designed to monitor, detect, and correct bias in high-risk AI systems may have limited effect on addressing discrimination given disparities in required impact assessments. They also highlight the limited impact of the proposed documentation requirements of AI systems since the documentation does not need to be provided to those potentially affected by the systems, only to regulators upon request.²⁹² Furthermore, the new rules will require complex implementation. Member states would need to create new AI regulatory authorities to implement and ensure compliance of these rules, including coordination with other regulatory bodies with intersecting responsibilities.

Geopolitical considerations

China is increasingly influential in exporting and shaping AI surveillance technologies and norms globally, which creates risks for upholding human rights, including the right to nondiscrimination. Experts are concerned that China’s deployment of machine learning includes leveraging surveillance technologies to target specific groups.²⁹³ For example, Chinese data scientists have trained a facial recognition algorithm on ethnographic data to recognize the faces of Uighur people, a predominantly Muslim ethnic group in China.²⁹⁴. The Chinese government is alleged to be leveraging this technology to track people in the Uighur community, according to a news report citing five people with direct knowledge of the systems, who were granted anonymity by the New York Times.²⁹⁵ In reaction to these developments, a resolution adopted by the European Parliament in December 2020 stated: “The European Parliament strongly condemns the extensive use of digital surveillance technologies to monitor and control the population in Xinjiang,” and expressed concern that China is exporting surveillance technologies to authoritarian regimes around the world.²⁹⁶ An example of these exports is Chinese company Huawei Technologies, with research showing that the company is the leading vendor of AI surveillance technology worldwide and that it is shaping standards by embedding them in products, such as facial recognition systems, deployed across at least fifty countries.²⁹⁷

Western democracies can set international standards that limit data-driven discrimination in machine learning, but they lack an aligned approach. Despite their shared values, the United States and the EU are not closely aligned in many aspects of governing technology. For example, the White House Office of Science and Technology issued a statement in January 2020 arguing that “Europe and our allies should avoid heavy-handed innovation-killing models.” The statement continued by saying that to shape the evolution of technology and “counter authoritarian uses of AI,” the United States and its partners need to remain the global hubs of innovation.²⁹⁸ There is some acknowledgment of this view in Europe—in October 2020, twenty-six EU member states agreed that the EU approach needs to provide “sufficient flexibility in research and development of these [AI] systems.”²⁹⁹ However, EU member states put a heavy emphasis on the need for high ethical standards.³⁰⁰ To effectively influence international standards, the European Commission should intensify cooperation with US authorities to seek areas of cooperation, and the newly established EU-US Trade and Technology Council (TTC) is therefore a step in the right direction.³⁰¹

National considerations

There is a growing awareness of the risks of discrimination in AI systems, causing more concerns in some European countries than others. A 2019 survey showed that citizens of France (46 percent) and Germany (43 percent) are more likely to be concerned that AI could lead to discrimination than those of Finland (35 percent) and Poland (28 percent).³⁰² Additionally, EU member states provide varying levels of resources to institutions investigating data-protection and antidiscrimination cases (see figure 5). For example, Germany significantly leads both in terms of funding and the number of tech specialists that work for its Data Protection Authorities.³⁰³ The European Commission will have to navigate and address differences in priorities and funding for protections against data-driven discrimination that could undermine a unified European approach.

Business considerations

Discrimination caused by machine learning can harm customers or product quality, posing significant risks for businesses. Biased systems result in inaccurate predictions and outputs that can be discriminatory. They can unfairly allocate opportunities or negatively impact people’s well-being.³⁰⁴ Biased systems can consequently lead to infringements of human rights and consumer protections, resulting in costly system overhauls and liability disputes for damages caused.³⁰⁵ They also can negatively impact a business’s reputation, harming consumer trust and future market opportunities.³⁰⁶ For example, in 2019, a Microsoft filing to the US Securities and Exchange Commission flagged the potential for reputational harm due to “flawed” AI algorithms or data sets with “biased information.”³⁰⁷

Businesses are aware of the importance of responsible AI but are unsure how to address the issue in practice. For example, a 2021 survey by PwC found that over two-thirds of US businesses were not taking action to reduce bias from AI.³⁰⁸ A 2020 report by the UK Centre for Data Ethics and Innovation concluded that many organizations are unsure of how to address bias in practice, including choosing the right mitigation methods and embedding them into the operations of the organization.³⁰⁹

While business can benefit from clear guidelines, they are wary of the economic cost of strict regulations and their impact on innovation.³¹⁰ In 2019, the executive vice-president of the European Commission, Margrethe Vestager, said that the ethics guidelines established by the EU’s High-Level Expert Group saw “a very high number of businesses sign up,” showing a demand for more guidance.³¹¹ However, similar to regulatory measures to protect data privacy (see Chapter 3.1), while businesses can benefit from guidance, overly strict rules aimed at limiting data-driven discrimination could pose compliance costs that prohibit the development of new products and services, particularly for smaller businesses.³¹² The European Commission needs to introduce rules that would increase companies’ uptake of the ethical principles without unnecessarily stifling innovation and hampering start-ups.³¹³

Recommendations for the European Commission

This section addresses the issue of discrimination specifically in the context of data and machine learning and offers the following recommendations.³¹⁴

A. Work with the European Parliament and member states on providing more specific requirements for data-management standards to mitigate risks for discrimination.

This could include adding additional mechanisms that would diversify the reliance on self-assessment by system providers as well as requiring more transparency and broader public access to the bias assessment documentation of AI systems. It could also include ensuring that the proposal addresses discrimination that can stem from data sets that might be of high quality but reflect existing societal biases, and therefore represent risk for the AI systems to embed and amplify discrimination.

B. Provide additional financial and human resources to national data-protection authorities and other relevant bodies that investigate antidiscrimination violations, and carefully define and coordinate the establishment of new enforcement authorities with the existing ones.

Since the implementation of the upcoming EU AI regulation will rely on supervisory authorities of member states, this would enable the relevant authorities to better enforce existing laws by addressing disparities in the budgets of those authorities across the EU. Additionally, the newly established enforcement authorities, such as the EU Artificial Intelligence Board, and the national authorities need to be well integrated with the existing data-protection authorities. The European Commission should carefully define the roles and responsibilities of the newly established authorities to prevent potential overlaps or loopholes in the implementation of the regulation and to tap into existing resources and expertise.

C. Increase transatlantic cooperation to close the gap between the US and the EU approaches to mitigating risks of bias and discrimination enabled by data in machine learning.

While the United States is likely to maintain a decentralized approach to regulating AI, the EU-US TTC, launched in June 2021, provides an opportunity to narrow the gaps in the EU and US regulatory approaches. The TTC should consider incorporating the issues of bias and discrimination enabled by data in machine learning in the agenda of two of its working groups (the data governance and technology platforms and the misuse of technology threatening security and human rights group).³¹⁵ The issue of addressing the potential for replicating or scaling discrimination could also be addressed in the Transatlantic AI agreement that the European Commission and the high representative of the Union for foreign affairs and security policy proposed to start working on in a joint December 2020 communication, “A New EU-US Agenda for Global Change.”³¹⁶

Conclusion

Machine learning will have a transformative effect on industries and societies. The rules on who can access data and how it can be used for machine learning will have significant bearing on an economy’s ability to take advantage of the technology. To compete with Chinese and US industries, the EU will need data policies that spur innovation through machine learning. However, to mitigate the negative consequences on fundamental rights and European autonomy, it will equally need policies that address these risks. This report provides European decision makers with a set of recommended actions designed to unlock the potential of machine learning, while mitigating the risks (see summary of recommendations). These recommendations are based on a close examination of the geopolitical considerations of the EU, domestic considerations of EU member states, and commercial considerations of businesses. Using this framework, the following are the key policy objectives that our recommendations intend to achieve. They represent what we believe the European Commission should pursue in relation to data rules for machine learning.

First, increase access to public-sector data by:

addressing fragmented open data rules and procedures across Europe to reduce the administrative and legal costs for businesses seeking to reuse public-sector data
allowing access to sensitive data with appropriate privacy preserving techniques and protections in place to accelerate public and private research and development

Second, harmonize data standards for interoperability by:

facilitating the development of tools and protocols that enable the seamless exchange and translation of differently structured data across systems and organizations in Europe.

Third, diversify data storage and processing by:

ensuring coordination of proposed and existing cloud initiatives and minimizing compliance costs of new regulatory measures
encouraging additional investment in data infrastructure services to better ensure competitive cloud offerings for EU-based businesses
refraining from protectionist data processing and storage measures that have a more negative impact on smaller and more trade-dependent EU member states

Fourth, simplify data transfers by:

developing new cross-border data mechanisms that ensure data protection while enabling EU-based businesses to participate in global value chains
reducing legal ambiguity in EU legislation that increases risks and costs for businesses sharing data

Fifth, foster inclusive data ecosystems by:

adopting sector-specific data portability rights and introducing rules on when businesses should allow access to proprietary data sets

Sixth, enhance privacy and trust by:

clarifying guidance on applying the GDPR to machine learning techniques to reduce the costs of compliance and legal uncertainty while enhancing data protection
investing in research in privacy-preserving machine learning techniques and enhancing cooperation with like-minded countries like the United States

Seventh, mitigate data-driven discrimination by:

clarifying responsibilities of existing and proposed authorities to enforce non-discrimination laws to avoid overlaps or loopholes in the implementation
ensuring national data-protection authorities another enforcement agencies are adequately funded
increasing transatlantic cooperation on mitigating the risks of bias and discrimination enabled by data in machine learning

The EU is falling behind China and the United States in its ability to harness machine learning. This has the potential to weaken the ability of EU-based businesses to compete in a global market, reinforce the EU’s dependencies on foreign digital technologies, and undermine its ability to shape global rules at the intersection of technology and society. However, if the EU manages to strike the right balance in its regulatory approach by unlocking the opportunities for businesses while preserving robust protections for fundamental rights, it could put the EU on a more sustainable footing to leverage machine learning technologies. If the commission succeeds, not only will it enhance European economic prosperity and deliver tools that can help address society’s most pressing challenges, Europe will also serve as a model for other countries to follow in achieving innovative policies that benefit and deliver value to society.

About the authors

Blanka Soulava is a Master in Public Policy Candidate at the Harvard Kennedy School. Previously, she worked as a Trainee at the Office of the Vice-President of the European Parliament and as a campaign assistant and advisor during the 2018 Czech presidential elections. Blanka’s policy interests include digital and tech policy and negotiation. She has a bachelor’s degree in International Area Studies from Charles University in Prague, a diploma in Economics from the University of London, and studied Chinese at Fudan University in Shanghai.

Hamish Cameron was a Graham T. Allison, Jr. Student Fellow at the Harvard Belfer Center and completed a Master in Public Policy at the Harvard Kennedy School in 2019-21. He previously served as a diplomat in Beijing with the Australian foreign service. He has a background in strategic economic issues including digital governance, energy, and climate change, and he developed the Australian Government’s 2021 Services Exports Action Plan. He has a Bachelor of Arts and a Bachelor of Science from the University of Melbourne.

Victoria Ying is currently a Master of Public Policy candidate at Harvard Kennedy School (HKS). At HKS, her academic interests include machine learning, privacy and security, and misinformation and disinformation campaign. Outside of HKS, she is building organizational partnerships to advance tech-enabled negotiation solutions.

Acknowledgements

This report would not have been possible without the generous help and support of Harvard Kennedy School Professors Bruce Schneier and Thomas Patterson, as well as Trey Herr and Safa Shahwan Edwards at the Atlantic Council’s Cyber Statecraft Initiative. We would also like to thank our families and loved ones who supported us in our work throughout the challenging pandemic year.

Additionally, this report benefitted significantly from the time and insights shared by experts from across government, industry, academia and civil society. For this, we would like to acknowledge the following people: Alex Cooke, Andrea Renda, Antonio Capobianco, Christie Lawrence, Daniel Leufer, Eline Chivot, Grace Abuhamad, Gretchen Greene, Henrique Choer Maraes, Iga Kozlowska, Jaana Sinipuro, James Mancini, Jörn Fleck, Kenneth Propp, Kurt John, Luc Nicolas, Marc Lange, Markus Kalliola, Mayank Prakash Jain, Nicolai van Gorp, Raluca Csernatoni, Valentina Pavel, Werner Stengg, Yushi Wang.

The Atlantic Council’s Cyber Statecraft Initiative, part of the Atlantic Council Technology Programs, works at the nexus of geopolitics and cybersecurity to craft strategies to help shape the conduct of statecraft and to better inform and secure users of technology.

learn more

Image: Abstract background image, perfectly usable for a wide range of topics related to computer networks or technology in general.

Share

Issue briefs and reports

Fast Thinking

New Atlanticist

Inflection Points

UkraineAlert

MENASource

Econographics