Much Ado About Data NotebookArticle

Ransbotham, Sam. “The Subtle Sources of Sampling Bias Hiding in Your Data.” MIT Sloan Management Review (2017).


Courtney Bir, Purdue Graduate Student Research Assistant


Data has become an integral part of any business. However, you must exercise caution when employing data – especially survey data – to extract insights used in business decisions. Ransbotham outlined five potential pitfalls when collecting and analyzing data, including: (1) the history behind your data, (2) more data may not mean better data, (3) old data sets were imperfect too, (4) intuition is still important, and (5) who is analyzing the data matters. By carefully understanding the weak points or hidden biases in your data, you can use data more effectively.

What this means for food and agribusiness

Data is everywhere in agribusiness. Take for example yield and production data, client surveys, and retail consumer buying behavior. In the quest to make sound business decisions, analyzing more data seems like a tantalizing solution. However, according to Ransbotham, you need to consider five pitfalls when analyzing data.

The first potential source of bias he presented in the paper was the history behind your data. How you collected the data is an important consideration. If you send out surveys via email, only those who have access to the internet can respond. Most people have internet access today, but if you are comparing to a data set from 30 years ago, it may tell a different story. It is important to consider the population you are actually reaching when determining surveying methods. Another potential collection hazard, particularly in agriculture, is the season during which the data is collected. Many agricultural products have seasonal demand, and producers allocate their time differently throughout the seasons.

Simply collecting more data may not solve the problem of inherent biases. Larger data sets are not too big to fail and might just result in more data with the same biases. Only this time, the sampling errors may be hidden in the increased volume. Larger datasets take more time and resources to evaluate, so it is important to consider if potential additional insights are worth the investment and added complexity. For example, you may be able to garner the same insights on a new product with a well-balanced sample of a few hundred people, when compared to a far from representative sample of 1,000s.

Although you may be feeling nostalgic for the good old days, old data sources were imperfect too. Take care when benchmarking new data sets against older ones. Were long-distance phone calls an upcharge at the time? Are you using new equipment that significantly changes your production process? It took time to understand your old data sets, so it’s important to take time to understand the new ones as well. It’s important to build on flaws from previous data collection attempts, and try to strive for a better data set in the future. If you are using data collected by others when benchmarking your data, such as the USDA, it is also important to determine how that data was collected, and any inherent biases that may be relevant. No data collection process is perfect, so comparisons amongst data collection methods are necessary over time and over varying sampling methods.

Intuition is still important. Understanding potential sampling bias cannot be done by a machine. Human expertise is needed to be aware of ongoing trends, to know what has changed throughout time, and to recognize what may be missing from a sample. For example, if there is a new law in place that effects how much water or other inputs you can put on your field, the data you collect today may differ from previous years. For animal producers, new feed additives, rations, and supplements are continuously being developed. If you are analyzing the benefits of a new housing system, can you attribute an increase in production to your new facility or are there other differences in the data?

Finally, who analyzes the data matters. It is difficult to recognize potential sample bias and how to minimize that bias in future collection efforts, and it takes a savvy person to collect and interpret the data. Analyzing data takes an eye for detail, patience, and a good understanding of how the data was collected. If multiple people in the organization will be involved in the data-generating and analysis processes, it is important they all understand the history and limitation of the data set. Data can be a powerful tool when trying to make business decisions. Careful planning and thought when collecting and analyzing data can help you maximize the potential benefit of properly employing data analytics.