The Internet and Engaged Citizenship

Why Understanding the Digital Citizen Proves So Difficult

David Karpf
Commission on the Practice of Democratic Citizenship

Somehow, the Internet has managed to remain new for three decades. The Internet was on the cusp of transforming civil society in 1995 when Nicholas Negroponte wrote Being Digital,1 and in 2001 when Cass Sunstein wrote,2 and in 2004 when Joe Trippi wrote The Revolution Will Not Be Televised,3 and in 2008 when Clay Shirky wrote Here Comes Everybody.4 It remains new today, even as it has been integrated into the rhythms of daily life. Every U.S. presidential election since 1996 has been dubbed “The Internet Election.”5 The Internet has repeatedly promised to transform government, to open up a new era of transparency and accountability, and to disrupt journalism (for better or for worse). Even as the medium has achieved near-universal adoption among the American public, there is a pervasive sense that the Internet remains confounding for everyday citizens.

There are some striking similarities between the uncertainties, hopes, and fears that were expressed decades earlier and those that are still voiced today. The first generation of “digital citizens” are now in their thirties and forties. Traditional news media organizations have been in crisis for over a decade. Political polarization and the coarsening of civic discourse have been looming threats since at least the turn of the millennium. Why do we keep returning to these same concerns? Why has it been so hard to generate a robust, stable understanding of the Internet’s role in civic life?


The Pace of Internet Time

One consistent quality of the Internet is how it continues to change. The Internet of 2019 is a different medium than the Internet of 2009 or 1999. We moved from desktop portals, onramps to the “infobahn,” to wifi-connected laptops, producing blogs and wikis, and then moved further still to mobile devices and a social sharing economy that is dominated by a few quasi-monopolistic platforms that algorithmically shape what we see and how we interact. This pace of change renders the Internet substantively different from previous innovations in communications technology. When the telephone, the radio, and the television were diffusing through society, they were stable technologies—a television, telephone, or radio purchased in 1950 functions in much the same way as a television, telephone, or radio purchased in 2000. But the Internet of 2019 bears only a faint resemblance to the Internet of the 1990s.

The sheer pace of “Internet Time” frustrates our attempts to assess conclusively the Internet’s impact on civic and political engagement. By the time researchers believe we have a handle on a digitally enabled social phenomenon, the digital environment has changed, and social phenomena have changed as well. The ceteris paribus assumption (all else being equal) undergirding virtually all research methods is routinely violated by the Internet’s constant redevelopment.6 Kevin Munger terms this a problem of “temporal validity,” in which the findings of online social science research can be rendered suspect solely by the passage of time.7

As one example, let’s think back to Robert Putnam’s warning in Bowling Alone that the Internet was contributing to a sharp decline in social capital.8 To paraphrase, Putnam was concerned that citizens were increasingly anchored to their desktop monitors instead of going outside and interacting with their communities. Later researchers gathered evidence running contrary to this claim. Multiple studies showed that the Internet often augments offline social ties rather than replacing offline relationships. We cannot state with confidence that these findings refute Putnam—he might have been empirically correct at the time his book was published. The Internet of the 1990s, after all, was accessed through clunky desktop computers that tied up the phone lines. The Internet of the 2000s was more portable. You could talk on the phone or attend a community event while staying online. Instead of directly refuting Putnam’s thesis, these later studies temporally bound his claims.

This issue is further exacerbated by the plodding pace of academic publishing. It still takes years for academic research to move through the stages of research design, institutional approval, funding, data collection, data analysis, and multiple rounds of rigorous peer review. While there have been marginal improvements (many academic journals now make preprint articles available as soon as final revisions are submitted, rather than embargoing for months until the print version is published), our traditional system of academic knowledge production is fundamentally slow-moving in nature. Internet Time and the problem of temporal validity leave social scientists continually questioning prior research findings, preventing the smooth aggregation of theories and hypotheses that typically take place during the formation of a new research paradigm.

Meta-analyses of peer-reviewed research on the Internet and civic/political engagement have painted an unstable, conflicting picture. Shelley Boulianne has conducted a series of meta-analyses, revealing that Internet use is usually found to have a small-but-positive effect on traditional, off­line forms of participation.9 Bruce Bimber and Lauren Copeland, however, looked through data from the American National Election Studies from 1996, 1998, 2000, 2004, and 2008 and found no evidence of a robust relationship over time.10 The Internet’s effect on civic and political behavior depends, it seems, both on how the terms are defined, how they are measured, and what year the study is conducted.

A Brief History of Internet Time

It is helpful to demarcate the history of the Internet into four periods. Over the past few decades, we have moved sequentially through four dominant metaphors that have shaped public understanding of the Internet’s role in civic life. Each of these metaphors is, necessarily, incomplete. The medium has always been more complex than any simple story might convey. Yet each has nonetheless exerted a type of force, defining how we collectively view and speak about the Internet. As our metaphors have changed, so too have the problems and solutions that the Internet is associated with.

First is the metaphor of the “Virtual Community.” This originated in the 1980s, prior to the creation of the World Wide Web. It was promoted and disseminated by communitarian journalist Howard Rheingold in his influential book, The Virtual Community: Homesteading on the Electronic Frontier,11 and further documented in journalist Katie Hafner’s book The Well: A Story of Love, Death & Real Life in the Seminal Online Community.12 The Internet of the 1980s was not heavily populated, but it did feature robust Bulletin Board Services (BBSs), most famous among them being the Whole Earth ‘Lectronic Link (The WELL). These BBS communities featured both inspiring community behavior and rational critical debate as well as troubling flame wars and trolling behavior. They demonstrated, at much smaller scale, many of the same civic behaviors we witness on social media today. They also helped to define the civic potential of digital networks, inspiring many of the early technologists, public intellectuals, and investors who would go on to popularize the increasingly mass medium.13

The second period is the infobahn, or the “information superhighway.” This metaphor emerged in the early 1990s, alongside early plans for the National Information Infrastructure and the commercialization of the Internet. The Internet of the 1990s was defined by static web pages (Geocities), early search engines (Mosaic and Netscape), and walled-garden Internet portals like AOL. Early government websites were conceived as informational resources—“brochureware” or online billboards that could serve as tools for early Netizens to learn more about public policy and public affairs. The infobahn metaphor was often coupled with buoyant optimism about the potential of digital citizenship, empowering an engaged public that could become better-informed than ever before.14

The third era is that of Web 2.0 and online collaboration. After the dotcom bubble burst, renewed excitement about digital media clustered around the social web. The Internet of the 2000s was defined by what Yochai Benkler terms “commons-based peer production”15 and what Henry Jenkins terms “convergence culture” or “participatory culture.”16 Websites like Wikipedia, Craigslist, and the early blogosphere all demonstrated the complex, collaborative endeavors that citizens could potentially co-produce online. As online publishing platforms became more user-friendly, connection speeds got faster, and data storage became cheaper, citizens appeared to be taking a much more active role in civic, cultural, and political affairs. The Web 2.0 metaphor is also frequently paired with Clayton Christensen’s theory of disruptive innovation.17 Wikipedia disrupted the encyclopedia industry; online file-sharing disrupted the music business; CraigsList and the blogosphere disrupted journalism; the Howard Dean, Ron Paul, and Barack Obama presidential campaigns disrupted the political parties. Thus Web 2.0 as a metaphor was not just focused on what online communities could collectively produce, but also gestured toward what they might soon replace.

Finally, there is the notion of the platform society.18 The Internet of the 2010s has been increasingly defined by smartphone usage and the growth of social media. This has led to increased attention paid to the major platforms themselves. Facebook, Amazon, Twitter, and Google are now no longer treated as the neutral intermediaries for public expression, innovation, and collaboration that they were during the Web 2.0 era. They instead have emerged as powerful gatekeepers, invested with both our hopes and our fears for civil society. Talk of disruption has been replaced by talk of regulation and monopoly.

These four perspectives on the role of the Internet in civic life have been layered atop one another. You can still find virtual communities today, and major collaborative sites like Reddit have much of the spirit of the old Bulletin Board Systems. Every company, campaign, and civic organization has a website. There is a wealth of information online (rendered accessible through Google search) for those who are motivated to find it. Peer production and collaboration still abounds, particularly among young people and within cultural industries. It is not the case that the eras of virtual communities, or brochureware, or Web 2.0 ended. Rather, those eras faded, replaced by different tools, different behaviors, and different problems.

The challenge for producing stable public knowledge, then, is that the Internet has both seeped into so much of public life and has also been continually redefined. It manages to be so many things, all at once.


The Proprietary Data Gap

A second problem that has plagued public scholarship on the Internet and civic/political life is the substantial gap between public and private data.

If there is one thing that we have undoubtedly learned from the Cambridge Analytica scandal in 2017–2018, it is that the major social media platforms collect an overwhelming amount of data on public behavior. Facebook has a record of every click, every view, every share, every like. Google, Amazon, Netflix, and every other major platform collect extraordinary amounts of data as well.

The sheer amount of data that is collected by the major platforms has helped to fuel enthusiasm for “big data” analysis. In 2009, researchers working for Google published a paper in Nature on the Google Flu Trends study.19 These researchers had analyzed Google search data and used it to predict flu outbreaks more quickly and accurately than the Center for Disease Control. Civic-minded and politically focused researchers have pointed to this study as evidence of all the social behavior that can now be more effectively assessed through online trace data. In their book, Political Turbulence, Helen Margetts and coauthors write, “Every participatory act, however small, carried out on social media leaves a digital imprint. So mobilizations produce digital trails that can be harvested to generate large-scale data, which can be retrieved and analysed with software, text- and data-mining tools, and network analysis.”20

The promise of online data abundance has turned out to be something of a mirage, however. Every participatory act may leave a digital imprint, but that imprint is only visible to a select set of companies. It is heavily guarded, thinly regulated, and protected with the force of law.21 There is a substantial gap between the proprietary data that Facebook, Google, and the other major platforms hold and the public data that researchers have access to.

The aftermath of the Cambridge Analytica scandal serves as a helpful example. Jonathan Albright, Research Director at Columbia University’s Tow Center for Digital Journalism, is among the most prominent researchers to study that issue. Albright’s approach to studying digital influence operations has revolved around attempting to reverse-engineer online propaganda networks based on the limitations of publicly available data. Another researcher, David Carroll, Associate Professor at Parsons School of Design at The New School, sought to gain insights into Cambridge Analytica’s influence operations through strategic lawsuits aimed at forcing the company to reveal its data practices. A third researcher, Emma Briant, Senior Lecturer in Journalism at the University of Essex, primarily focused on interviewing former Cambridge Analytica staffers. All three researchers have approached the topic of digital propaganda by crafting indirect approaches that can partially bridge the knowledge gap between public and proprietary data. Facebook and Google’s extensive data are only available to select academics, under select circumstances. Public data are always far more limited.

At the same time, the major tech platforms have increasingly begun to employ social science researchers. Both Facebook and Google employ political science and communication Ph.D.s, along with lawyers and policy analysts. These social scientists gain access to proprietary data, but the tradeoff is that they primarily pursue applied research questions that are of benefit to the companies, and that they can only publish their research in exceptionally rare circumstances. Incidents like the 2014 “Facebook emotional contagion” study by Kramer and colleagues have only served to dampen the companies’ enthusiasm for open collaboration with academic researchers. Kramer and coauthors collaborated with Facebook on an experimental tweak to the newsfeed algorithm. Some users received a higher dosage of negatively valenced Facebook posts from their friend networks; other users received a higher dosage of positively valenced Facebook posts. They discovered a miniscule but statistically significant effect on users’ posting behavior. If you see sad posts in your newsfeed, you become slightly more likely to perform sadness in your own postings; if you see happy posts in your newsfeed, you become slightly more likely to perform happiness. There was immense public backlash when this study was published. The company had secretly manipulated its users’ emotions without asking for their informed consent. The academic researchers who had partnered with the company had skirted traditional research ethics protocols. The irony in this case is that, since Facebook’s newsfeed is algorithmically generated, the company is in effect always slightly manipulating its users’ emotions. The company is constantly refining its algorithms based on proprietarily held user data. The difference in this case was that the company had made its experimental findings public.

There are two natural consequences of the proprietary data gap. First, the research community has habitually fallen into a modified version of the parable of the drunkard’s search. (A drunkard frantically searching for his keys under a lamppost. “Did you lose your keys here,” you ask. “No, I lost them across the street,” he mumbles. “Then why are you searching under this lamppost,” you reply. “Well, the light is much better over here.”) We produce mountains of Twitter and website research. We produce molehills of Facebook research. We produce practically no research on email, Reddit, or the algorithmic choices of the major platforms themselves. And this is entirely because Twitter has, for several years, made its data more easily accessible to researchers than Facebook. Websites can be crawled and scraped, while email lists are closely guarded by civic and political organizations. In the era of big data, most of the research community has flocked to the types of big data that are most accessible.

The second natural consequence is that, at least in the areas of civic and political behavior, the gap between proprietary and public data is immense and practically unbridgeable. Social scientists at Facebook, Google, and the major political campaigns have access to information that the broader research community can never analyze. It is particularly difficult to produce stable public knowledge about the Internet’s impact on civic and political behavior because the data that would form the foundation of such research are proprietarily held.


