Preserving Intellectual Legacies in the Digital Age

“A vivid glow NOOKed in her face, lighting up both her sorrow and her joy. . . .” Leo Tolstoy did not write this, but such writing was attributed to him when a company called Superior Formatting Publishing reformatted War and Peace from Amazon’s Kindle platform to Barnes & Noble’s Nook platform. The reformatting went haywire, and suddenly “NOOK” replaced the word “kindle” in copies of the literary classic transmitted to unwitting readers.

In the case of War and Peace, the result was slightly irritating and largely comedic. Sadly, medical scholarship, financial forecasting, and a range of public decision-making are also susceptible to such vagaries of transmission. In 2013, a team of researchers at UMass Amherst discovered that economists who were advising the EU to undertake austerity measures at the height of the recession had accidentally excluded several key data points in their Excel spreadsheet. A mistaken code in the age of self-driving cars or medical big data could easily lead to life-threatening decisions.

Learning to cope with the transitory nature of information storage and transmission will eventually become a normal feature of twenty-first-century scholarship. In the worst cases, one wrong click of a mouse button and weeks of research, years of written text, and decades (or, in the case of War and Peace, centuries) of preservation can be undermined, effectively making the written word as transitory as the spoken one.

A group of Academy members decided that learned societies, along with libraries, publishers, software companies, information engineers, and lawyers need to make a more coordinated effort to help scholars navigate this new terrain. This concern led to a symposium on “Preserving Intellectual Legacies in the Digital Age,” which was held at the House of the Academy on September 23, 2016, under the auspices of the Academy’s Exploratory Fund. The conference, convened by Academy members Carla Hesse (Dean of Social Sciences at the University of California, Berkeley) and Pamela Samuelson (Richard M. Sherman Distinguished Professor of Law and Information at the University of California, Berkeley) brought together librarians, legal scholars, poets, computer and cognitive scientists, publishers, sociologists, historians, and classicists.

The symposium began with a keynote presentation by Brewster Kahle (Founder of the Internet Archive), who stressed the values of equity in digital access, the reformulation of current copyright policies (particularly with regard to securing more rights for authors and distributors), and the importance of innovation.

The conference focused on five themes: the role of libraries and access to knowledge; sustainable infrastructure for knowledge creation; archiving challenges; epistemic integrity; and policies to avoid oblivion. The sessions were animated by the realization that as more and more scholarship is digitized and scholars become increasingly dependent on digital technology to preserve and archive their scholarly findings, librarians, archivists, and curators need to partner with the worlds of technology, philanthropy, policy, and publishing to ensure that intellectual legacies survive for future generations of scholars. The participants discussed the interactions between copyright law and obsolescence, the authentication of authorship, creating financial models that allow libraries and archives to catalog and preserve digital copies of books and journals as easily as physical copies, academic mentorship for the twenty-first century, preservation of data sets and algorithms, and the integrity of a digital manuscript. The conversations also focused on the more technical problems of access, user-interface, and the mechanics of hardware and software.

Throughout the conference, the participants highlighted several challenges and cited sobering statistics. Digital humanities scholar Abby Smith Rumsey noted that nearly 80 percent of all silent films produced in the early decades of cinema have been lost entirely. Jonathan Zittrain (Professor of Law and Librarian at the Harvard Law School) commented that 75 percent of the links cited in the Harvard Law Review are inaccessible online because the links to those articles are no longer accurate (a phenomenon known as “link rot”).

Participants at the Academy’s meeting on Preserving Intellectual Legacies in the Digital Age
The transition to digital scholarship and digital preservation also highlights the emerging challenges of up-to-date digital libraries. As Paul Courant (Harold T. Shapiro Collegiate Professor of Public Policy, Arthur F. Thurnau Professor, Professor of Economics, and Professor of Information at the University of Michigan) mentioned during the conference, physical libraries benefitted from the structural existence of what could have been a guiding invisible hand: “It is by total dumb luck, of the way that printing and publishing works technologically combined with the missions of the academy and approximately rational behavior on the part of the university administrations, who were competing in a space for quality, that no one had to do anything very special in order for the great bulk of the published academic literature to be organized in ways that made it fairly durable and easy to find.”Former Harvard University Librarian Robert Darnton pointed out that three publishing houses control 42 percent of all scholarly articles that are published each year, and can thus exercise an outsized influence over what knowledge is and is not accessible. Carla Hesse and her Berkeley colleague Molly Shaffer Van Houweling noted that the vast majority of scholarly works produced in the twentieth century are effectively invisible: these works are not commercially viable for their publishers to reprint, but they are still under copyright protection, and thus cannot be made available digitally. In addition to a failure to preserve and maintain access to older cultural materials, new bodies of content are being created without sufficient attention to how that content will be preserved. Dan Cohen (Executive Director of the Digital Public Library of America) observed that Facebook produces more data than any other company in the world, but it does little to preserve these data (at least not in a way that would make them available to future scholars). And several participants pointed out that even if the data that are being created by social media companies and others were made available, the data are organized through algorithms that are and will remain proprietary, posing additional challenges. Access to both new and old data is complicated by the need for software that will enable that access. Mahadev Satyanarayanan (Professor of Computer Science at Carnegie Mellon University) remarked that there is virtually no effort made to preserve what he referred to as “software executability,” or technology that will ensure that future users of preserved software will actually have the same user experience that original users did.

There is very little reason to believe that digital libraries will function in the same way. Things are not as easy to find on Google Scholar and there is not yet an intuitive way to organize such findings or even secure funding for their organization. Both the need for early investment and the scale of contemporary preservation mean that digital libraries will have to engage in a large coordinating practice if material is not to be lost.

But instead of merely highlighting challenges, participants also began to identify things that universities, libraries, publishers, authors, and learned societies can do to enable continued access to scholarship in the digital age. Many of these solutions focused on steps that can be taken to increase authors’ control over the fate of the texts that they produce. Authors often transfer their copyright to publishers, who thereafter control how articles and books are disseminated. This control can last for the entire term of the copyright (which continues for the life of the author plus seventy years), even though publishers’ interests in commercial dissemination typically last only a few years. While authors’ interests in reaching readers and spreading knowledge continue, the authors’ ability to pursue those interests can be hampered by their lack of copyright control. Helping scholars understand their options for managing their copyrights so that they can be empowered to ensure that their rights are aligned with their interests was a process that many participants felt should be part of graduate education. (This is work that is promoted by the Authors Alliance, a group with which several of the meeting participants are involved.)

While there is reason to believe that the age of the printed scholarly book may be coming to an end, it is not clear what will replace it. Several participants stressed the need to ensure that the scholarly record does not disappear by neglect when this shift takes place. Dan Cohen proposed encouraging libraries, universities, and learned societies to devote 1 percent of their annual budgets to a collective effort at digital preservation, and to invest in technologies and user interfaces that will ensure preservation by default rather than by accident.

While increased federal funding for such a sustainable infrastructure to protect scholarship would be valuable, Don Waters (Senior Program Officer for Scholarly Communications at the Andrew W. Mellon Foundation), among others, suggested that a more realistic first step would be to coordinate “micro-preservation” at the campus level, which would connect scholars and Academy members on university campuses with their local archivists and librarians to ensure that their legacies are preserved. This type of bottom-up approach would help scholars preserve their own work and test techniques that might eventually be deployed by larger-scale efforts.