The Data Driving Democracy

Barriers and Challenges

Back to table of contents
Christina Couch
Commission on the Practice of Democratic Citizenship

Experts who were interviewed for this report highlighted three major obstacles that hinder digital civic engagement research: 1) data access and legal concerns, 2) ethical issues, and 3) insufficient research infrastructure. This section dives briefly into each of these problems and outlines a few proposed solutions in various stages of execution. This section is far from exhaustive. Fifteen experts were interviewed. All fifteen detailed an array of very specific concerns facing the field. This section touches on the problems that continually cropped up in the interviews, but by no means covers everything.

Data Access

Experts interviewed almost unanimously agreed that data access is one of the biggest obstacles that prevent researchers from getting a comprehensive understanding of how search engines, social media platforms, and other online spaces are changing democracy. Experts were clear and adamant: It is impossible to fully understand how activity on online platforms influences democracy without having more data from those platforms, as well as a better understanding of what data those platforms actually collect. The current research landscape is patchy at best, the experts said. David Lazer from Northeastern University’s NULab for Texts, Maps, and Networks compared it to “looking up at the sky with no instrumentation. We can’t see the stars because we don’t have equipment to do that and that’s a real shame.” The lack of data access not only prevents researchers from getting a full picture; it also prevents studies from being replicated and results from being repeatedly confirmed over time.

Platforms limit data access through API restrictions, rate limiting, and anti-scraping technologies as noted earlier in this report. Experts also added that changes in algorithms, platform design, and company policies can affect data access as well. Many pointed to the myriad changes that Facebook made in the wake of a highly controversial 2014 study on mood manipulation90 and the Cambridge Analytica scandal91 from last year as examples of platform policy changes that radically disrupted data flow and interrupted ongoing research. Both of those events are further outlined in the Ethical Challenges section of this report. These types of changes that affect data access not only prevent researchers from pulling data from a specific platform, they can also render both software and research libraries that rely on specific APIs functionally inoperative.

Christian Sandvig, director of ESC: the Center for Ethics, Society, and Computing and professor of digital media at the University of Michigan, said that data can also be limited through restrictive terms of service agreements, some provisions of which have been found not to be legally enforceable.92

“Researchers are really in a bind because they tend not to have any legal expertise and their university or their [institutional review board] might be conservative about it, so that they might say, ‘Oh, you have to follow all the rules,’ but the rules are just ridiculous,” he said.

One expert who did not want to be identified described some terms of service agreements as “utterly Orwellian” and said that it’s often unclear what exactly researchers are allowed to do with data even if they are publicly available. For example, researchers “are totally simpatico with the idea that we shouldn’t download a bunch of Twitter data and then put out an estimate that says, ‘Hey, we think Jane Doe is a Neo-Nazi.’ We should not do that. We totally understand that,” the expert said. “We certainly think we should be able to form our own internal estimate and then put out aggregate data that’s not identifying anyone saying, ‘Hey, we noticed that everyone who tweeted saying that they approve this policy is far-right or liberal.’ By Twitter’s own terms of service, we literally are not able to know that, which makes no sense. If we publish aggregate data, is Twitter going to cancel our accounts? We don’t know with certainty.”

Experts took a variety of paths to get around these issues and several praised independent projects that help researchers better access and analyze data within current platform restrictions. Some experts specifically cited Jason Baumgartner’s efforts to increase access to reddit data through and the Pushshift API as one example. Other researchers discussed creative ways that they’ve built their own makeshift APIs within legal boundaries by compiling publicly available data from resources like RSS feeds. They noted that these methods were generally more difficult and time-consuming than pulling data straight from an API. Several experts said that it’s obvious from reading emerging research in the field that some teams—none who were interviewed for this report—ignore legality entirely and simply scrape sites without regard to terms of service. One expert said that the entire research field “lives in a little bit of fear” of the day that platform executives decide to crack down on academics who violate their terms of service.


Ethical Challenges

Even if data access problems vanished overnight, researchers would still be confronted with ethical challenges of working in this space. Privacy and consent are primary issues here. Experts said that for certain projects, they already face challenges with trying to ensure that datasets are free of identifying information. Deen Freelon said that for his own media monitoring research, getting public-facing information while excluding content like health, financial, and educational data points that are “prohibited for us to know by law” was tough, even when using machine learning tools, blacklists, and domain filtering to eliminate data that users have not consented to share.

Privacy and consent become even more complicated as research increasingly shows that accurate, and often invasive, inferences about a specific individual can be drawn even if that person’s data are not available. One analysis of 30.8 million tweets found that it’s possible to accurately predict what someone will post online just by analyzing social media posts from eight or nine of their contacts,93 meaning that an individual’s privacy and ability to consent rely on a network of people. Some scholars feel that platforms’ gestures toward differential privacy (publicly sharing some information about a dataset in order to examine patterns of use while concealing data that would identify individuals) is presented as a technological fix that does not address the underlying concerns about data availability.

Experts also said that it’s not always clear what, and who, is considered a public versus a private entity. While political campaigns and figures create social media posts with the broader public in mind, ethics get muddier when using data from individuals who may only expect their public posts to be seen by a few hundred followers. Some in the field who were not interviewed for this report have also noted that for large-scale studies, getting informed consent is not always practical.94

Francesca Tripodi said that open data access won’t solve the field’s ethical issues. Tripodi brought up a now infamous 2014 study conducted by researchers at Facebook that looked at how emotions are transferred between users through the platform’s news feeds.95 Dubbed “the Facebook mood experiment,” the study was widely criticized by the research community because the experiment was run without informing users that their feeds were being manipulated or that they were being studied. “When researchers are afforded the opportunity to get inside the system, are ethical boundaries being followed?” Tripodi asked.

Many experts said that the need for best ethical practices for researchers is even more pressing in the wake of the Facebook–Cambridge Analytica scandal, the genesis of which was rooted in academic research. To provide a quick recap, the scandal originated when University of Cambridge professor Aleksandr Kogan collected Facebook data through a personality quiz app called thisisyourdigitallife, which harvested data from users who gave consent to have their data used for academic purposes and from their friends and contacts who did not give consent.96 As was widely reported in the media, data from up to 87 million users97 were sold to Cambridge Analytica and used to develop highly controversial profiling tools and to deploy targeted political ads, most notably to support Donald Trump’s98 and Ted Cruz’s99 respective presidential campaigns. Facebook maintained that Kogan’s original data collection methods were legitimate, but transferring data to a third party violated the platform’s terms of service.100 Since then, Facebook has pivoted toward privacy and rolled out a number of privacy-enhancing features for users, including message encryption and proposed time limits on how long posts are saved,101 and it has further restricted data access for both researchers and developers.102

To create better ethical frameworks for the field, some experts said that they want better communication channels between the academic community and technology companies and better systems for holding organizations accountable for clear ethical violations. A few mentioned the need for academics to develop ethical standards for platform data use in research contexts. Deen Freelon said that institutional review boards could provide valuable resources for researchers trying to navigate ethically murky waters, but as of now, many aren’t educated about the issues involved in social media research. If brought up to speed, these entities could “help researchers construct their studies in ways that balance ethics with the most effective methods.”


Research Infrastructure Insufficiencies

Experts also said that academic research infrastructure is severely underfunded and unsophisticated compared to advertising and marketing infrastructure. Researchers in this field spoke extensively about the difficulties they face in studying subjects like media manipulation, misinformation campaigns, and surveillance advertising because research tools available to academics lag far behind analytics systems used by large organizations.

“If you’re a large company, you’re going to have apparatus for understanding client feedback that is so well-developed that you’re going to be able to act on information almost instantaneously when it hits the market,” Joan Donovan from the Shorenstein Center on Media, Politics and Public Policy said. “University researchers don’t have access to tech on that scale because it costs millions to make.”

Experts said that funding problems hit them from multiple angles: research teams are expensive, hardware that can support sufficient computational power is expensive, and data are often expensive. Sam Gill, vice president of Communities and Impact and senior advisor to the president at the John S. and James L. Knight Foundation, said that there also needs to be more investment in laying a strong pipeline for young talent to break into the field.103

“You need really good graduate research assistants and postdocs in order to do the research, and then you need them going out into the world to academic jobs to propagate the research and the methods and the advancements in the field, and then you need them going out to other sectors, policy-making and applied work,” he said. “We’re missing a lot of that.”

A few experts who work on qualitative and interdisciplinary research also expressed concern about a dearth of funding and grant-making resources for basic research in the area of digital politics and civic engagement. Jesse Baldwin-Philippi, associate professor in the Communication and Media Studies Department at Fordham University who studies political communication and campaigns, noted that there are many grants that fund research around interventions aimed at improving specific metrics of civic engagement and solving specific problems, such as the spread of misinformation, but there are fewer funding options for work that examines more fundamental questions about how political campaigns and advocacy groups operate online.104

Baldwin-Philippi added that some political and social scientists who are focused on this type of basic civic engagement research, whether in or outside of online contexts, are especially concerned about funding in the wake of recent changes that the National Science Foundation (NSF) made to their Social and Economic Sciences Division. The changes, which went into effect on October 1, 2019, “repositioned” several NSF programs, including transforming the Political Science Program into two separate initiatives—one focused on funding basic research around security and preparedness and the other centered on “issues broadly related to attitudes, behavior, and institutions connected to decision-making processes, the provision of essential services, and accountability mechanisms.”105 While NSF representatives have reported that they believe that the changes will ultimately increase funding for political science research, the American Political Science Association issued a statement expressing concern that the move could further limit the types of projects that NSF supports.106

Funding is only part of the infrastructure problem. Experts also said that the academic research infrastructure is often set up in ways that silo researchers and inhibit interdisciplinary work. A few experts said that their teams largely work in isolation within their institutions and they spoke of limited opportunities for collaboration. Some experts called for better partnerships and communication channels between academic groups and tech companies. Several experts interviewed for this report cited independent research institutes that have tech company backing as viable ways to move the field forward. The Data & Society Research Institute, which was originally supported by funding from Microsoft, was mentioned by multiple experts as an example of best partnership practices.

There are projects in the works that are aimed at solving some of these issues. Several experts pointed to Social Science One’s data sharing initiative (more on that project and its challenges in the next section) and to the Knight Foundation’s $39 million investment in grants for cross-disciplinary research aimed at understanding how technology is transforming democracy107 as examples. Knight Foundation funds are being distributed to eleven American research institutions and think tanks that will support the creation of five new interdisciplinary centers of study. The Knight Foundation is also supporting an additional $11 million in research that looks at Internet and digital platform governance.108 The goal, Sam Gill said, is to provide insights on pressing policy questions and to help accelerate this emerging research field for the long-term future.

Some researchers are attacking academic infrastructure issues by creating shared resources that can be used to better understand this research landscape. At the Shorenstein Center, for example, Joan Donovan’s team is compiling one hundred case studies that document how misinformation travels across the web. This shared digital research infrastructure, called the Global Media Manipulation Case Book, is designed to teach those who contend with media manipulation, including researchers, policy-makers, and journalists, how to spot and debunk organized campaigns.109