Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are by Seth Stephens-Davidowitz

There is a data set ‘out there’ that knows more about us than surveys or social media (or even our friends!?) could ever hope to know. It’s that little bar into which we (apparently) type our deepest secrets, even the really really unflattering ones … the stuff that would (and should) make other people recoil if they saw you type it. And they certainly wouldn’t hear you say it out loud, because you know better than that. What is this little bar we’re talking about, the one in the middle of your screen just begging to receive the questions you wouldn’t dare ask another human? It’s the Google search bar and the resulting Google search data.

Seth Stephens-Davidowitz’s book, Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are delves into a variety of topics using Google search data. Seth has used Google Trends for much of his work, augmenting it with Google AdWords data and his own Tends-based algorithm, which is described in detail in his dissertation and journal article, The cost of racial animus on a black presidential candidate: Using Google search data to find what surveys miss. The bottom line is, people tell Google things they might not tell anyone else — not their friends, not their spouse and not their doctor. Seth also posits — and I think I agree with him — that search data may reveal lies we even tell ourselves. So, it isn’t just that we lie to others, we lie to ourselves too. But given how much better Google search data seems to reflect reality in the topics delved into in the book than survey data or other more traditional data sources, it seems we do not lie to Google (or at least not nearly as much).

Stephens-Davidowitz worked at Google as a data scientist after it became aware of his research on racism using Google search data. His features in the New York Times include pieces such as How Racist Are We? Ask Google, The Data of Hate and Searching for Sex. His book explores topics from mental illness to human sexuality to child abuse to religion. Not surprisingly, these highly-sensitive and emotional topics are difficult to delve into using survey data. Also not surprisingly, these topics receive a lot of attention in Google searches. On page 14, it is stated, “I am now convinced that Google searches are the most important dataset ever collected on the human psyche.”

Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are is presented in three parts. Part I is entitled “Data, Big and Small”, Part II is “The Powers of Big Data” and Part III is “Big Data: Handle with Care”. The conclusion is aptly titled, “How Many People Finish Books” and suggests the answer is not very many; however, in my opinion, the conclusion is well-done and worth the read.

In 2020, we talked on Consumer Corner about how you were a hypocrite (and you still are … sorry to be the one to tell you). Then, we welcomed 2022 by calling you a liar. In fairness, the title was Well, You’re Also a Liar, implying that it isn’t just you, it’s us too. We know that most of us keep everything on the Internet; we live on the Internet, and probably have an Apple Watch strapped to our wrists to complement the functionality of the iPhone clutched in our hands. I mean, Have You Hugged Your iPhone Today was one of our favorite posts from 2021, after all.

Chapter 8 is entitled “Mo Data, Mo Problems? What We Shouldn’t Do”, and starts with, “Sometimes, the power of Big Data is so impressive it’s scary. It raises ethical questions.” It is argued there is danger in empowered corporations essentially fueled by big data, understanding who is likely to do what or to learn what customers can or will pay. Gambling is highlighted as a space in which being able to gain such in-depth and specific understanding of a customer is potentially dangerous as one tries to extract maximum profit for the house. Gambling is, after all, a treasure trove for studying human behavior which we’ve delved into before in the context of data-driven decision making (or lack thereof). Seth says on page 265, “Data on the internet, in other words, can tell businesses which customers to avoid and which they can exploit. It can also tell customers the businesses they should avoid and who is trying to exploit them. Big Data to date has helped both sides in the struggle between consumers and corporations. We have to make sure it remains a fair fight.”

While there is a lot to be gained from understanding ourselves and our societies through new data sources, data use remains a question that is being grappled with in many (if not most, or even all) industries. We recently argued that big data is a challenge, but the use of that data is a wicked problem. As Seth tells us on page 270, “So we have to be really cautious about allowing the government to intervene at the individual level based on search data. This is not just for ethical or legal reasons. It’s also, at least for now, for data science reasons.” Really, taken in context of the whole book, it isn’t just the government, but any entity, that faces ethical, legal and technical reasons to question data use, especially at the individual level. Given the power in online data, and especially search data, I suspect that the question of use will only grow more complicated as the possibilities continue to expand for what’s possible within these datasets.

A version of this appeared recently as ConsumerCorner.2022.Letter.37.