Social media has revolutionized the behavioral sciences. Now, social scientists can use individuals’ social networks to gather large amounts of data, which can then be mined to uncover how groups of individuals think or behave.
According to a press release from McGill University, “A growing number of academic researchers are mining social media data to learn about both online and offline human behavior. In recent years, studies have claimed the ability to predict everything from summer blockbusters to fluctuations in the stock market.”
However, two computer scientists, Juergen Pfeffer and Derek Ruths at McGill University and Carnegie Mellon University respectively, have recently published an article in the journal Science warning that these datasets may be misleading researchers. The scientists write that there are biases inherent to gathering information on various social media platforms, and these shortcomings must be corrected or acknowledged when publishing studies using this data.
As Pfeffer tells Phys.org, “Not everything that can be labeled as ‘Big Data’ is automatically great…But the old adage of behavioral research still applies: Know Your Data.” Still, Pfeffer acknowledges that the attraction of using this data, however flawed, can be very strong. He continues, “People want to say something about what’s happening in the world and social media is a quick way to tap into that…You get the behavior of millions of people—for free.”
Still, Pfeffer and Ruths directly address many of the challenges of interpreting this data in their article. For example, different social media platforms (i.e. Pinterest vs. Facebook) attract different users. For example, Pinterest has mostly female users aged 25-34, thus making generalizations about human behavior from data collected from this platform is likely to lead to biased results.
There are a number of other serious challenges when interpreting social media datasets. For example, as described in the McGill press release, “Large numbers of spammers and bots, which masquerade as normal users on social media, get mistakenly incorporated into many measurements and predictions of human behavior.” As well, “Researchers often report results for groups of easy-to-classify users, topics and events, making new methods seem more accurate than they actually are. For instance, efforts to infer political orientation of Twitter users achieve barely 65% accuracy for typical users—even though studies (focusing on politically active users) have claimed 90% accuracy.”
These challenges and shortcomings are not limited to interpreting social media data, but are well known in other fields such as statistics and epidemiology, among other fields. As Ruths tells Chris Chipello of the McGill newsroom, “The common thread in all these issues is the need for researchers to be more acutely aware of what they’re actually analyzing when working with social media data.”