Everybody lies. People lie about how many drinks they had on the way home. They lie about how often they go to the gym, how much those new shoes cost, whether they read that book. They call in sick when they’re not. They say they’ll be in touch when they won’t. They say it’s not about you when it is. They say they love you when they don’t. They say they’re happy while in the dumps. They say they like women when they really like men. People lie to friends. They lie to bosses. They lie to kids. They lie to parents. They lie to doctors. They lie to husbands. They lie to wives. They lie to themselves. And they damn sure lie to surveys. Here’s my brief survey for you:

Have you ever cheated in an exam?

Have you ever fantasised about killing someone?

Were you tempted to lie?

Many people underreport embarrassing behaviours and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias. An important paper in 1950 provided powerful evidence of how surveys can fall victim to such bias. Researchers collected data, from official sources, on the residents of Denver: what percentage of them voted, gave to charity, and owned a library card. They then surveyed the residents to see if the percentages would match. The results were, at the time, shocking. What the residents reported to the surveys was very different from the data the researchers had gathered. Even though nobody gave their names, people, in large numbers, exaggerated their voter registration status, voting behaviour, and charitable giving.

Has anything changed in 65 years? In the age of the internet, not owning a library card is no longer embarrassing. But, while what’s embarrassing or desirable may have changed, people’s tendency to deceive pollsters remains strong. A recent survey asked University of Maryland graduates various questions about their college experience. The answers were compared with official records. People consistently gave wrong information, in ways that made them look good. Fewer than 2% reported that they graduated with lower than a 2.5 GPA (grade point average). In reality, about 11% did. And 44% said they had donated to the university in the past year. In reality, about 28% did.

Then there’s that odd habit we sometimes have of lying to ourselves. Lying to oneself may explain why so many people say they are above average. How big is this problem? More than 40% of one company’s engineers said they are in the top 5%. More than 90% of college professors say they do above-average work. One-quarter of high school seniors think they are in the top 1% in their ability to get along with other people. If you are deluding yourself, you can’t be honest in a survey.

The more impersonal the conditions, the more honest people will be. For eliciting truthful answers, internet surveys are better than phone surveys, which are better than in-person surveys. People will admit more if they are alone than if others are in the room with them. However, on sensitive topics, every survey method will elicit substantial misreporting. People have no incentive to tell surveys the truth.

How, therefore, can we learn what our fellow humans are really thinking and doing? Big data. Certain online sources get people to admit things they would not admit anywhere else. They serve as a digital truth serum. Think of Google searches. Remember the conditions that make people more honest. Online? Check. Alone? Check. No person administering a survey? Check.

The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.

I have spent the past four years analysing anonymous Google data. The revelations have kept coming. Mental illness, human sexuality, abortion, religion, health. Not exactly small topics, and this dataset, which didn’t exist a couple of decades ago, offered surprising new perspectives on all of them. I am now convinced that Google searches are the most important dataset ever collected on the human psyche.

The Truth About Sex

How many American men are gay? This is a regular question in sexuality research. Yet it has been among the toughest questions for social scientists to answer. Psychologists no longer believe Alfred Kinsey’s famous estimate – based on surveys that oversampled prisoners and prostitutes – that 10% of American men are gay. Representative surveys now tell us about 2% to 3% are. But sexual preference has long been among the subjects upon which people have tended to lie. I think I can use big data to give a better answer to this question than we have ever had.

First, more on that survey data. Surveys tell us there are far more gay men in tolerant states than intolerant states. For example, according to a Gallup survey, the proportion of the population that is gay is almost twice as high in Rhode Island, the state with the highest support for gay marriage, than Mississippi, the state with the lowest support for gay marriage. There are two likely explanations for this. First, gay men born in intolerant states may move to tolerant states. Second, gay men in intolerant states may not divulge that they are gay. Some insight into explanation number one – gay mobility – can be gleaned from another big data source: Facebook, which allows users to list what gender they are interested in. About 2.5% of male Facebook users who list a gender of interest say they are interested in men; that corresponds roughly with what the surveys indicate.

Green digits on a computer monitor
‘How, therefore, can we learn what our fellow humans are really thinking and doing? Big data.’ Photograph: Thomas M Scheer/Getty Images/EyeEm

And Facebook too shows big differences in the gay population in states with high versus low tolerance: Facebook has the gay population more than twice as high in Rhode Island as in Mississippi. Facebook also can provide information on how people move around. I was able to code the home town of a sample of openly gay Facebook users. This allowed me to directly estimate how many gay men move out of intolerant states into more tolerant parts of the country. The answer? There is clearly some mobility – from Oklahoma City to San Francisco, for example. But I estimate that men moving to someplace more open-minded can explain less than half of the difference in the openly gay population in tolerant versus intolerant states.

If mobility cannot fully explain why some states have so many more openly gay men, the closet must be playing a big role. Which brings us back to Google, with which so many people have proved willing to share so much.

Countrywide, I estimate – using data from Google searches and Google AdWords – that about 5% of male porn searches are for gay-male porn. Overall, there are more gay porn searches in tolerant states compared with intolerant states. In Mississippi, I estimate that 4.8% of male porn searches are for gay porn, far higher than the numbers suggested by either surveys or Facebook and reasonably close to the 5.2% of pornography searches that are for gay porn in Rhode Island.

So how many American men are gay? This measure of pornography searches by men – roughly 5% are same-sex – seems a reasonable estimate of the true size of the gay population in the United States. Five per cent of American men being gay is an estimate, of course. Some men are bisexual; some – especially when young – are not sure what they are. Obviously, you can’t count this as precisely as you might the number of people who vote or attend a movie. But one consequence of my estimate is clear: an awful lot of men in the United States, particularly in intolerant states, are still in the closet. They don’t reveal their sexual preferences on Facebook. They don’t admit it on surveys. And, in many cases, they may even be married to women.

It turns out that wives suspect their husbands of being gay rather frequently. They demonstrate that suspicion in the surprisingly common search: “Is my husband gay?” The word “gay” is 10% more likely to complete searches that begin “Is my husband…” than the second-place word, “cheating”. It is eight times more common than “an alcoholic” and 10 times more common than “depressed”.

Most tellingly perhaps, searches questioning a husband’s sexuality are far more prevalent in the least tolerant regions. The states with the highest percentage of women asking this question are South Carolina and Louisiana. In fact, in 21 of the 25 states where this question is most frequently asked, support for gay marriage is lower than the national average.