The Fourth Industrial Revolution is highlighted by the interconnection of devices and sensors to the internet. The computing and communication capabilities of these devices allow for roughly 2.5 quintillion byes of data to be produced, stored, and analyzed daily. For example, every second, an exponential amount of healthcare data is generated and mined for valuable insights. Today, approximately 30% of the world’s data volume is being generated by the healthcare industry. By 2025, the compound annual growth rate of data for healthcare will reach 36%. This data is fed into machine learning and artificial intelligence models that have strong impacts on multiple healthcare domains that have the potential to impact the socioeconomic statuses of billions of people across the world. Those entities that have functional access to data capital have more options than those that do not. The data divide is the gap that exists between individuals who have access, agency, and control with respect to data and can reap the most benefits from data driven technologies, and those who do not. The data divide can only be reduced if the there is optimization in the data process, monitoring and evaluation of the policies and programs from major stakeholders, and alignment of public private partnerships for social good.
Key steps in closing the data divide include
- understanding who or what is not represented in legacy datasets or during data generation to ensure mitigation of bias in ML/AI training datasets;
- recording data provenance that confirms authenticity of the data and builds trust and credibility in the reproducibility of the results from ML/AI training sets;
- requiring balanced statistical representation in the datasets used in modeling processes, to reduce ML/AI statistical bias; and
- ensuring ethical data stewardship for access and privacy concerns in ML/AI-based data operations.
Three stakeholder groups—private-sector firms, governments, and civil-society organizations— have important roles to play vis-à-vis the data divide.
- Private-sector firms capture and process copious amounts of data that are both valuable for their shareholders and socially valuable. When private-sector firms consider the needs of stakeholders aside from their shareholders, these data can be shared with governments and civil-society organizations, and used for social good.
- Governments have a dual role as the sole arbiters of data policy, as well as being major data capturers and processors. Policies that incentivize corporations to share socially valuable data, practices that make government-owned data more readily available, and efforts to reduce bias in government-collected datasets are all ways that the government can contribute to bridging the data divide.
- Civil-society organizations of all types have a key role to play regarding the data divide. They can train a new, more inclusive generation of data professionals, create new data-governance structures, and advocate for legislation that will positively affect the distribution of access and control over data across society.