In Search of Truth: From Human Learning to Machine Learning

Truth: What is it? Will societies ever come to a consensus on how to define it? If we cannot, how can we ever design machines that learn from it? A very important concept in the field of machine learning is a concept of “ground truth”. The Oxford Dictionary defines ground truth to be “A fundamental truth. Also: the real or underlying facts; information that has been checked or facts that have been collected at source.” From this definition, we see the term “fundamental”, which immediately sets the stage for controversy surrounding the definition of the term “truth”, for if truth were indeed pure and absolute, why the need for adjectives such as “fundamental” or “ground”? However, the focus of this article is not to philosophize about the semantics surrounding the term truth. Instead, this article seeks to explore the potential implications relating to i) who provides ground truth data for machines to learn from, ii) should “majority rule” be the policy decision on what is considered “truth”?

In a general sense, let us assume that both humans and machines learn from experiences, and that an experience can (in some part) be described by quantifiable factors that we will simply call “data”. To illustrate the challenges of defining what “truth” is, the following example is provided. Imagine you are bestowed with the task of teaching your student (be it human or machine) what an object is. For simplicity, let us assume that there are three steps involved in this process: i) observation (i.e., actually sensing the object), ii) classification (determining what the object is, based on its features) and iii) feedback (internal or external guidance on whether your classification was indeed correct).

To ensure that you’re up for the task, you use the following simple test to evaluate your knowledge of objects:

Step 1. Observation: Click “Next” to observe the object.