I spent May 16 & 17 talking with computer scientists, faculty, and other archivists about the potential application of technology to archival description. The venue was a Radcliffe Institute-sponsored workshop, “Technology and Archival Processing,” organized by Marilyn Dunn and Barbara Grosz, respectively the Executive Director of the Schlesinger Library on the History of Women in America and the Institute’s Dean. Over the course of the two-day event, participants– who included Clifford Lynch (Coalition for Networked Information), representatives of OCLC, Microsoft, Google, and the Internet Archive, Jeffrey Schnapp (Harvard’s metaLAB), Stuart Shieber (DASH, Harvard’s open access repository), and archivists from MIT, Tufts, the Kennedy Presidential Library, and Harvard repositories, and many others– discussed the uses of technology to improve discovery and access by users and to increase the speed with which collections described, preserved, and opened for research. Participants found these issues to be intertwined and discussions ranged across topics with some recurring themes. As an archivist with a deep commitment to access, I can’t imagine a more productive or interesting way to spend two days.
“First you must recognize when you are bored,” admonished Chung-chieh Shan (Rutgers), indicating that it is the archivists’ responsibility to identify tasks that are better executed by machines. Repetitive intellectual tasks, such as recognizing patterns of numbers or letters, matching, searching databases, and so on, can be more effectively accomplished by computers, leaving the more demanding problems that require assessment, judgment, or interpretation, to humans. Archivists were pleased to learn of existing tools, currently used in data archives and other data-intensive environments that might be applied—and surprised at how rapidly engineers understood archival practice and pointed to process elements that computers cannot address—the “squishy human bits” like relationships with donors, policy development, and institutional culture—and collection “problem children” that require extensive human judgment and handling.
Summary documents and action items will be forthcoming from the Institute. However, it seemed to me that there were some common questions and several areas of general agreement. A few of these include:
– If innovation is to occur in the tools used by archivists, it must also occur in archival practice. The potential for change ranges from crafting more uniform donor-generated restrictions (so they are machine-implementable), altering ideas about acceptable levels of risk (more risk is OK), and revising the archivist’s job description (less focus on routine tasks, more focus on auditing, editing, moderating, and approving work carried out by machines and crowdsourcing;
– Scanning is a part of any machine-enhanced improvement of processing. This could take several forms: scanning to process (small percentage of the whole), scanning for access (low resolution scans of the full collection used to generate data for analysis, linking, and identification of restricted materials), and/or scanning for preservation/reuse (high resolution scans that could also represent the originals);
– New thinking and tools can be applied to paper-based collections, digitized collections, and born digital collections. In fact, most agreed that born digital collections require this revised workflow – and that the massive volume of such collections will make the paper-based backlog look petite;
– ‘Single stream’ accessioning and processing of born digital collections, now being explored by some repositories, may influence workflow in paper-based collections. Is single stream processing and scanning in our future?
– In addition to re-thinking the archival role, we need to find new ways to leverage the interest and skills of scholars. Traditionally, archivists have had a symbiotic relationship with scholars, alerting scholars to collections of potential interest and learning from scholars’ use of collections. Can technology enable deeper collaborations?
– On the topic of users, digitized and born digital collections will change the user experience; how will we present these collections, where, to what audiences, and with what tools?
Attendees pledged to consider how to continue this useful dialog and plan for next steps. The Center is already engaged in some related projects; I am looking forward to more opportunities for discussion and collaboration. For a non-archivist’s take on the workshop, see Jon Kolko’s blog.