In Irving Bacheller's 1900 novel Eben Holder, one character tells another that the protagonist is so honest that he "Never ketched a fish bigger'n 't was."
The phrase is one of thousands that Craig Messner has been collecting since his undergraduate days. Now a postdoc in Johns Hopkins' Center for Digital Humanities, or CDH, Messner remains interested in what is known as "nonstandard orthography"—systems of alternate spelling and punctuation that often denote certain character traits, which are frequently racialized, regionalized, and classed.
In particular, Messner wants to understand to what degree these systems are consistent—or not—within an author's work, across multiple authors, across regions, and through time. For his English literature dissertation at UCLA, he developed a computer program that characterized the phrases he fed into it, and he used the results to go back over his texts manually to discover the surprises that often suggest areas for interesting new discoveries; for example, when the same alternate spellings were used for both a Black southern character, and a white character in rural New England.
But such work is not only time-consuming, but also challenging to scale up and vulnerable to human subjectivity, Messner says. At the CDH, he is working toward using what is known as a large language model to encode a vast number of pieces of dialogue in ways they can be compared computationally across the specific areas he's interested in. If successful, he will be able to classify the nonstandard orthography in huge chunks of 19th and early 20th century American literature, finding both patterns and exceptions that may suggest new avenues of inquiry.
"Coming here offers the opportunity to be exposed to all the possibilities that exist in the cutting edge of computational linguistics and computer science, and find something that might do this on a scalable and provably accurate level," Messner says.
Decoding digital humanities
The CDH was launched in 2021 to help scholars combine the powers of the human brain with the powers of computation, opening up possible new areas of research, says director Tom Lippincott. In Messner's research, for example, computers can point him toward patterns or surprises that he would not likely otherwise discern through his thicket of data, and then he can apply his expertise and human intelligence to probe what those findings might mean.
"There's a whole space to explore of what humans can do that computers can't, and vice versa, and how they complement each other," Lippincott says. Humans have one set of mechanisms for insights and reasoning and inference, and computers have another. What happens, he asks, when you combine those perspectives and direct them toward questions researchers have never imagined pursuing?
The field of digital humanities is broad and vague, says Lippincott, with different people often understanding it differently. For some, it means accessing, synthesizing, and analyzing data relevant to humanities disciplines. The Sheridan Libraries' Data Services department specializes in this approach, not only for the humanities but across disciplines at Hopkins, providing a resource for faculty, students, and classes interested in querying data as well as using GIS and working with geospatial data, data visualization tools, and others.
For some, the digital humanities field is about public humanities, or making those archives widely accessible, often allowing communities to learn about themselves in ways not otherwise possible. Jessica Marie Johnson, associate professor in the Department of History and senior research faculty at the CDH, often works to connect Black communities with archives related to their histories in meaningful ways.
The CDH's mission is more focused, Lippincott explains. He and the rest of the CDH staff—three other postdocs in addition to Messner—are working with humanities faculty to understand the various ways that researchers would be interested in using computational assistance to reveal previously undetected relationships among enormous sets of data points. Meanwhile, they are creating an overarching system—architecture, in the lingo—that scholars in fields ranging from art history to classics to English will be able to use to explore their data in new ways.
Learning the ropes
Since the lean center won't have the capacity to craft customized systems for each interested researcher, Lippincott plans instead to guide researchers to arrange their data into an outlined format—a kind of sophisticated spreadsheet—that the overarching system will be able to read. Often, scholars hoping to explore how computation can enhance their research already use something like a spreadsheet to organize their materials. The CDH aims to extend this by gathering richer domain descriptions, such as numbers corresponding to human ages or a group of values to describe a particular person. Such information will help train and use tailored machine learning models, and eventually allow for cross-disciplinary insights; for example, identifying authors who also appear in historical records through political or economic activities.
Users will enter digital information—like text, numbers, images, or videos—that corresponds to something tangible, like people, locations, or events. Messner, for example, feeds in a piece of text with non-standard language. What they receive back is a sequence of numbers that represents the various elements of what they put in; in Messner's case, they might represent a word, a piece of a word, or a single character. The sequences can then be used to compare the text to other text, find surprising text, or ask questions about missing information, for example.
A course, the first iteration of which the CDH staff is teaching this semester, will help graduate and undergraduate students become familiar with the data organizing principles that lend themselves to the CDH system, skills they can then use throughout their research careers.
"This course is the primary entry point for collaboration with the CDH, while also giving a broad overview of computational methods that might generally prove useful for humanistic scholarship," Lippincott says. "I'm hopeful that we're going to be able to create a small, embedded generation of people who are able to engage with us directly, and then that will grow."
Saying something new
Lael Ensor-Bennett, curator of the Visual Resources Collection, provides images and teaching support to faculty and students. She manages Johns Hopkins Digital Collections, and in addition to her digital projects, she's currently working with a large collection of 35mm teaching slides that H. Alan Shapiro, W. H. Collins Vickers Professor of Archaeology Emeritus and Academy Professor, asked her to evaluate.
Students are inventorying the slides and helping to determine which should be digitized, but once that's accomplished, Ensor-Bennett has a loftier goal in mind. Because the slides represent the sweep of Shapiro's teaching career, Ensor-Bennett suspects that computational intelligence may be able to help find and tell a story, even though she doesn't know yet what that story might be. It might be about pedagogy, she says, or the canon, or the history of slides and image-making.
"There's a growing interest in defunct technology. I'm thinking about how to leverage that to explore how the history of art was taught throughout time, and how that's changed, and what we can learn from our existing slide collection," she says.
That promise of exploring not-yet-imagined areas of inquiry is at the heart of digital humanities, Messner says. The CDH is working with scholars from Near Eastern Studies, for example, who have amassed large inventories of cuneiform tablets and are eager to explore what is possible to do with the data.
"That's where this collaborative moment can happen," Messner says. Once the center helps to make the data machine readable, a surprising element may become visible. Maybe that element is itself of interest in the scholar's field, or maybe it sparks a connection to an open research question.
"The idea is that we're building a system comprehensive enough to allow for any field's field-specific data to be something that can be inspected computationally," Messner says.
Where does ChatGPT fit in?
Just a few months ago, Lippincott would have said that even with all the recent advances in computational intelligence, there is still a clear dividing line between its abilities and those of humans. Humans are very good at things like close reading; careful, stepwise reasoning; and precise, deep delving into documents, he would have said, while computers are bad at those things, but very good at reading massive amounts of material, and going broad to find patterns beyond the physical limitations of what humans are able to see.
"I think that's been upended by ChatGPT much faster than I would have expected," he says.
But far from feeling threatened by that development, Lippincott sees an opportunity for the kind of work he's always wanted the CDH to tackle: identifying what distinguishes a human mind from a computer, where the limitations of each lie, and what can be accomplished by mining the untapped synergies between them.
That intersection is prime territory for deep interdisciplinary exploration, for which Hopkins is especially well suited, Lippincott says. The CDH partners closely with the Whiting School of Engineering's computer science department and its Center for Language and Speech Processing, whose more than 60 researchers study the science and technology of language and speech. Lippincott sees enormous potential for explicitly bridging the fields through shared graduate students and other initiatives.
Developments like ChatGPT are creating opportunities in areas that demand high levels of human expertise, from medical treatment to legal reasoning. Lippincott sees the humanities as another frontier at the extremes of human cognition that is being redrawn.
"There's a whole world of experimentation and inspection that needs to go on to really trace out those boundaries," he says.
The CDH will host Keystone DH, a conference for institutions and practitioners committed to advancing collaborative scholarship in digital humanities research and pedagogy across the Mid-Atlantic, from June 16 to 17. Ted Underwood, English professor at the University of Illinois Urbana-Champaign, and Andre Brock, Media Studies professor at Georgia Tech, will be the keynote speakers.