Analyzing Ebooks in the Age of Digital Locks: Challenges and Strategies

The following post was written for the forthcoming final report of the Books.Files project, led by Matthew Kirschenbaum and funded by the Andrew W. Mellon Foundation. The project’s final report is publicly available here, and you can also read Matt’s description of the Books.Files project’s rationale in Archive Journal. I presented an early version of this material at the 2019 conference of the Society for the History of Authorship, Reading, and Publishing (SHARP). For helpful feedback and conversation on these questions, I am grateful to the audience at SHARP, Matt Kirschenbaum, Victoria Owen, and Simon Stern.

What can an e-book reveal about the history and social contexts of its making, or the collaborative nature of its construction? This is the kind of question that bibliographers have been asking—and answering—with regard to printed books for many years, and it is a viable question for ebooks as well. However, ebooks are made of code organized into files, and it is nearly impossible to answer a question like this if those files are not accessible, along with digital publishers’ records generally. Scholars in fields ranging from analytical bibliography to book history to video game studies have emphasized the importance of first-hand analysis of digital objects at the level of code, and not just what we see on the screen. If we wish to understand the relationships between an ebook’s form and functionality, or if we need to account for an apparent error in an ebook’s construction, or if we are curious about plans for an ebook’s design that may have been abandoned but left vestigial traces in the code, we will need to look for evidence that can only be found within files that are increasingly walled off behind digital locks. The idea that the code, and not just the visible interfaces of ebooks and other digital objects, can yield insights about their natures is foundational to fields that care about materiality, and an important avenue of potential discovery. In this post, I’ll consider the challenges facing the code-level study of ebooks in a world of Digital Rights Management (DRM) systems, in which they are increasingly published with digital locks (known formally as Technical Protection Measures, or TPM) that impede direct access to them as primary evidence.

Ironically, digital locks themselves may be easily broken with tools that are not difficult to find on the web; the greater challenge, which is my focus here, is that the act of breaking TPM—or sharing the tools to do so—may fall within a grey area of copyright policy and law. Dan Burk, who works at the intersection of copyright law and digital materiality, articulates the crux of the problem: “Lacking the deliberative nuance of human agency, DRM lacks the flexibility to accommodate access or usage that is unforeseen, unexpected, or unanticipated.”[1] For the most part, DRM and digital locks are deployed under a narrowly positivist paradigm that assumes all possible uses of a text are knowable and codifiable in advance. Those who study books and reading, in all their forms, know that’s not true and never has been. Breaking digital locks for unauthorized resale or distribution on torrent sites is not something I would defend—it’s happened to my own books—but can there be a legitimate rationale for breaking digital locks in the service of scholarly research on digital artifacts?

In the United States, the Digital Millennium Copyright Act (DMCA) broadly prohibits the circumvention of digital locks on copyrighted materials, regardless of intention, and prohibits trafficking in technologies that facilitate circumvention. Section 1201 of the DMCA provides for exceptions to the DMCA’s anti-circumvention prohibitions, and those rules are revised every three years in conjunction with the Librarian of Congress. The European Union’s Information Society Directive contains similar prohibitions, and it also has a mechanism for member states to establish valid exceptions (e.g. breaking digital locks on an ebook to enable screen-reading software to work, often necessary for readers with visual disabilities). However, in the EU and the United States, there has been widespread concern that even with these mechanisms, DRM nonetheless inhibits uses of digital objects that should be—and in many case, are—legal and protected by the doctrines of fair use and fair dealing.

As a digital bibliographer based in Canada, I do my work in a country where these issues are far from settled. From 2017 to 2019, the Canadian Copyright Act underwent a statutory review, and a Parliamentary committee travelled the country to receive feedback from stakeholders. My own 2012 study of an ebook’s source code was one of many examples presented to the committee to support the idea that there are valid reasons for TPM circumvention. Remarkably, the committee’s final report (released in June 2019) recommends a balanced approached to TPM circumvention, including a non-exhaustive (“such as…”) approach to enumerating reasonable exceptions to copyright. Even fierce defenders of the public domain such as Michael Geist received the report optimistically, but whether its recommendations will become Canadian law is another question—and, even so, that may be cold comfort outside the borders of my home country.

So where does this leave someone who wants to sit down and dig into the code of an ebook right now, to see what they can learn? A university-based digital bibliographer wishing to examine the source code of an ebook may know precisely how to access its code, but may be more uncertain as to whether she can do so without violating copyright law, policies of universities or funding agencies, or the terms of End User License Agreements. The stakes are even higher for those in positions of precarity, and the chilling effects of uncertainty about DRM circumvention for scholarly purposes are very real. I’ll conclude with an outline of five possible responses to this scenario that I’ve identified (and named in the spirit of tvtropes.org), though none of them may be adequate on their own. I also hasten to add that these are descriptions of practices, not recommendations or legal advice (which I’m not qualified to offer). Scholars contemplating these kinds of strategies should always seek advice from someone qualified and authorized to provide it, such as a university copyright librarian.

  1. The “what happens in Vegas…” approach: breaking digital locks in the course of one’s research, but omitting discussion of how one broke them, or acknowledging that one broke them at all. This has the advantage of being a genuine path to knowledge about the artifact under study, and it provides the researcher with evidence that can answer many bibliographical questions. The downside is that one can’t be fully transparent about one’s methods, one can’t do this with students (or with peer workshops like those at the SHARP conference or Rare Book School), and one might still be breaking the law. 
  2. The “Thor Heyerdahl” approach: instead of breaking digital locks on the object under study, building a replica as a kind of manipulable model, and hoping it behaves analogously to one’s real object of study. Working in the spirit of experimental archaeology, it is much easier to create an ebook using an open standard like EPUB than it was for Thor Heyerdahl to build and sail his experimental ship, the Kon-Tiki, and one can test hypotheses about ebooks in safer environments than the waters of the South Pacific. This approach can work quite well in an educational context, but only with relatively simple digital objects using open standards like EPUB, and conclusions based upon it must rely on probability and conjecture rather than empirical evidence. 
  3. The “Spotify Teardown” approach: modelling the algorithms that govern a digital system by manipulating its inputs and examining the results. This strategy takes its name from the recent book Spotify Teardown: Inside the Black Box of Streaming Music, written by a group of researchers who wanted to understand the algorithms that govern Spotify’s behavior as a music distribution platform. Their multi-pronged set of methods included creating their own music label for research purposes, and using it to upload files that tested Spotify’s behaviors in various ways. This approach can work for those interested not just in a single digital thing, like an ebook, but in systems that circulate many digital things. Disadvantages include those for the “Thor Heyerdahl” approach mentioned above, and the possibility of legal pressure from the company under study (which the Spotify Teardown authors—and their funding agency in Sweden—successfully resisted). 
  4. The “grateful lurker” approach: documenting how online communities who care about certain kinds of digital artifacts are discussing them, curating them, and sometimes breaking them open to understand how they work—and how they share their evidence online. This strategy works especially well for video games, many of which have thriving online communities of modders: people whose work to repurpose video game engines to create new games often leads to discoveries about the original game’s development process. The evidence for those discoveries often comes in the form of abandoned design features or digital assets that the developers neglected to remove from the source code—both forms of evidence that usually require digital lock-breaking to access first-hand. I have also adapted this approach to the study of digitally curated musical recordings, though ebooks may not benefit from the same levels of dedicated online communities. The main advantage, of course, is that someone else is doing the digital lock-breaking—and they may document their methods and analysis to a reasonably high standard of evidence, sometimes even with informal community peer-review. However, second-hand lock-breaking also means second-hand evidence, which may not meet the empirical standards of scholarly researchers. Nevertheless, the “grateful lurker” approach does harmonize nicely with book history’s emphasis on reception and what D.F. McKenzie called the “sociology of texts,” and can shine a light on the valuable cultural heritage work done by online pro-am communities (i.e. amateurs whose work approaches or reaches a professional standard). Yet not all online communities may want that kind of light shone on them; following research ethics protocols for studying online communities is therefore essential. 
  5. The “Tom Petty” approach (cf. his lyrics to “I Won’t Back Down”): breaking digital locks openly and unapologetically on the understanding that one is acting reasonably within the limitations to copyright, with no intention of infringement or piracyand then standing one’s ground. Disadvantages are obvious, as are advantages. Less obvious, but no less real, are the networks of support and advocacy for those whose scholarship sometimes requires the protection of the law.[2] In ideal circumstances, this is not so much a challenge to copyright law as an opportunity to clarify its purpose and limits. Not an approach to try alone, but then again neither is most digital scholarship.

[1] Dan L. Burk, “Materiality and Textuality in Digital Rights Management,” Computers and Composition 27 (2010): 231.

[2] A good place to start is Patricia Aufderheide and Peter Jaszi’s book, Reclaiming Fair Use: How to Put Balance Back in Copyright, 2nd ed. (University of Chicago Press, 2018), especially their chapter “The Culture of Fear and Doubt, and How to Leave It,” 1–16.