Naming digital objects in a digital library sounds straightforward, but it's not. We know what names should be: unique, persistent, extensible, easy to use. But the best way to meet these requirements is not always clear. In fact, there are plenty of naming scheme proposals: Universal Resource Name (URN), Digital Object Identifiers (DOI), Archival Resource Key (ARK), Persistent Universal Resource Locator (PURL), Serial Item and Contribution Identifier (SICI), Handles, and Buckets.
This session of the Metadata Working Group will focus on how the CUL establish names, or identifiers for digital library objects. Adam Chandler, George Kozak, David Ruddy, and Elaine Westbrooks will lead a discussion about the naming problem, existing proposals, what we are currently doing, and what the CUL might want to consider doing with naming digital objects
"doorstep" (such as URNs). There are still a number of questions about how we could handle identifiers within Cornell and CUL. One issue is a distinction between semantic identifiers (intelligible to people: designed for people and machines) or non-semantic identifiers (not meaningful to people; designed for machines).
George reviewed the current methods used for naming and identifying digital objects in a number of CUL collections. He noted that identifiers are needed for files, metadata, images, and full-text. An important question to answer is what kind of digital object is being described. Is the baseline object a journal, an issue, or an article? Also, do we want the identifiers to be random, sequential, or derivative? "Smart" identifiers tell you something about the file simply from the name. George gave overviews of the different identification methods used by Making of America, Math Books, International Women's Periodicals, Euclid, the Ezra Cornell Papers, and the May Anti-slavery Pamphlet Collection. All of these collections present a difficulty in handling hierarchical information and relationships among images, OCR, and metadata files. George indicated that Euclid was most like a URN scheme -- articles are given identification numbers that are not smart in any way and rely on a file for resolving information. George said that each approach has been system specific and that there is a sense that a global approach would be valuable.
Elaine reviewed some limitations of the initial implementation of naming schemes and identifiers in the Cornell University Geospatial Information Repository. These stem from "smart" file naming conventions that were not limber enough to adapt to the asymmetrical growth of the CUGIR collection. A particularly difficult issue was dealing with multiple versions of the same data. A CUL internal grant allowed rethinking of access issues to create MARC records. The system developed created "buckets" to access all versions of the files at once. These buckets rely on URLs and a code that refers to a metadata record rather than a geospatial data file. The URLs could easily be adapted to serve as URNs.
Adam described a very recent suggestion by John Kunze called the "Archival Resource Key." Kunze argues that persistence of objects is purely a matter of service and not a function of an object or naming scheme. He proposes three requirements for ARK: a link of an object to a promise of stewardship, a link from an object to a description of it (based on an identifier from an institution naming authority), and a link to the object itself or a copy of it. The identifier is an associated string and an information resource that uses to a record to bind an identifier string to a set of identifying resource characteristics.