In 2020, we talked on Consumer Corner about how you were a hypocrite. And, you still are (sorry to be the one to tell you). Then in 2021, we talked about your YouTube-worthy temper tantrums and why they need to stop
(stop doing it and stop embarrassing yourself). Then, we welcomed in 2022 by . In fairness, the title was Well, You’re Also a Liar implying that it isn’t just you, it’s us too.
We know that most of us keep everything on the Internet; we live on the Internet, and probably have an Apple Watch strapped to our wrists to complement the functionality of the iPhone clutched in our hands. I mean, Have You Hugged Your iPhone Today was one of our favorite posts from 2021, after all.
Trying to understand human’s behavior, how people make decisions, and why is what we do on Consumer Corner, using a variety of data sources to attempt to do it. Surveys are valuable in designing questions about specific topics, such as when we delved into preferences for public versus private control for things like military services and education. But, surveys do have a variety of challenges, from ensuring adequate sample size to conduct the analyses desired, to worrying about response bias, enumerator bias, and a whole slew of other biases when we’re asking questions of people and respondents know their answers are being received (even if anonymously) by other people for analysis.
Online and social media data is potentially (at least in my opinion) the new frontier of data, and it exists in a variety of forms from talking on Twitter about holiday plans to images and even data coming from smart devices in your home. Social media data has been analyzed to explore public understanding of public health crises, like Zika Virus, and to question whether natural disasters with more social and online media posts receive more aid or funding (answer, they do not). But, social media has its challenges too … you post your best life on Twitter, might say things that do not accurately reflect how you really feel, and not everyone is represented. In short, there are challenges for any dataset, and social media data has its fair share.
But, there is an online data set that knows more about us than social media could ever hope to. It’s that little bar into which we (apparently) type our deepest secrets, even the really really unflattering stuff, the stuff that would (and should) make other people recoil if they saw you type it. What is this little bar that we’re talking about, the one in the middle of your screen, just begging to hear the questions you wouldn’t dare ask another human? It’s the Google search bar. And, the resulting Google search data. Seth Stephens-Davidowitz’s book Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are delves into a variety of topics using Google search data. Seth has used Google Trends for much of his work, augmenting it with Google AdWords data and his own Tends-based algorithm, which is described in detail in his dissertation and journal article The cost of racial animus on a black presidential candidate: Using Google search data to find what surveys miss. The bottom line is people tell Google things they might not tell anyone else; not their friends, their spouse, or their doctor. Seth also posits, and I think I agree with him, that search data may reveal lies we even tell ourselves. So, it isn’t just that we lie to others, we lie to ourselves too. But, given how much better Google search data seems to reflect reality in the topics delved into in the book than survey data or other more traditional data sources, it seems that we do not lie to Google (or at least not nearly as much). Seth Stephens-Davidowitz worked at Google as a data scientist after they became aware of his research on racism using Google search data. His features in the New York Times include pieces such as How Racist Are We? Ask Google, The Data of Hate and Searching for Sex. His book explores topics from mental illness to human sexuality to child abuse to religion. Not surprisingly, these highly sensitive and emotional topics are difficult to delve into using survey data. Also not surprisingly, these topics receive a lot of attention in Google searches. On page 14 it is stated “I am now convinced that Google searches are the most important dataset ever collected on the human psyche.”
Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are is presented in three parts. Part I is entitled “Data, Big and Small”, Part II “The Powers of Big Data” and Part III “Big Data: Handle with Care”. The Conclusion is aptly titled “How Many People Finish Books” and suggests the answer is not very many. But, the Conclusion is well-done in my opinion, and worth the read.
Chapter 8 is entitled “Mo Data, Mo Problems? What We Shouldn’t Do.” And starts with “Sometimes, the power of Big Data is so impressive it’s scary. It raises ethical questions.” It is argued there is danger in empowered corporations, essentially fueled by big data, understanding who is likely to do what, or to learn what customers can or will pay. Gambling is highlighted as a space in which being able to gain such in-depth and specific understanding of a customer is potentially dangerous as one tries to extract maximum profit for the house. Gambling is, after all, a treasure trove for studying human behavior which we’ve delved into before in the context of data-driven decision making (or lack thereof). Seth says on page 265, “Data on the internet, in other words, can tell businesses which customers to avoid and which they can exploit. It can also tell customers the businesses they should avoid and who is trying to exploit them. Big Data to date has helped both sides in the struggle between consumers and corporations. We have to make sure it remains a fair fight.”
While there is a lot to be gained from understanding ourselves and our societies through new data sources, data use remains a question that is being grappled with in many (if not most, or even all) industries. We recently argued that big data is a challenge, but the use of that data is a wicked problem. As Seth tells us on page 270, “So we have to be really cautious about allowing the government to intervene at the individual level based on search data. This is not just for ethical or legal reasons. It’s also, at least for now, for data science reasons.” Really, taken in context of the whole book, it isn’t just the government, but any entity, that faces ethical, legal, and technical reasons to question data use, especially at the individual level. Given the power in online data, and especially search data, I suspect that the question of use will only grow more complicated as the possibilities continue to expand for what’s possible within these datasets.
ConsumerCorner.2022.Letter.37