Naming, Identifying, and Persistence of CUL Digital Objects

David Ruddy
George Kozak
Adam Chandler
Elaine Westbrooks
October 25, 2002, 10:30am - 12pm, Olin Library 106

Description

Naming digital objects in a digital library sounds straightforward, but it's not. We know what names should be: unique, persistent, extensible, easy to use. But the best way to meet these requirements is not always clear. In fact, there are plenty of naming scheme proposals: Universal Resource Name (URN), Digital Object Identifiers (DOI), Archival Resource Key (ARK), Persistent Universal Resource Locator (PURL), Serial Item and Contribution Identifier (SICI), Handles, and Buckets.

This session of the Metadata Working Group will focus on how the CUL establish names, or identifiers for digital library objects. Adam Chandler, George Kozak, David Ruddy, and Elaine Westbrooks will lead a discussion about the naming problem, existing proposals, what we are currently doing, and what the CUL might want to consider doing with naming digital objects

Links

Ruddy, David. Why identifiers? (2002-10-25) PowerPoint An overview of naming and identifiers.
Kozak, George. Naming and identifying digital objects. (2002-10-25?) PowerPoint
Corson-Rikert, Jon; Elaine Westbrooks; Adam Chandler. Making and identifying digital objects : the CUGIR 2.0 approach. (2002-10-25) PowerPoint
Chandler, Adam. The ARK persistent identifier scheme. (2002-10-25) PowerPoint Overview of ARK.
Kunze, John; R.P.C. Rodgers. The ARK persistent identifier scheme. (2004-07-31) text/plain
Daigle, Leslie L. Uniform Resource Names (URNs) : next generation Internet identifiers. (1998-04-14) PDF
Digital Object Identifier System (DOI). Website
Persistent Universal Resource Locator (PURL). Website
SICI : Serial Item and Contribution Identifier standard. Website
Arms, William; Christophe Blanchi; Edward Overly. "An architecture for information in digital libraries". D-Lib Magazine, February 1997.
Maly, Kurt; Michael L. Nelson; Mohammad Zubair. "Smart Objects, Dumb Archives : a user-centric, layered digital library framework". D-Lib Magazine, 5(3), March 1999.

Minutes

Why Identifiers (David Ruddy)

"doorstep" (such as URNs). There are still a number of questions about how we could handle identifiers within Cornell and CUL. One issue is a distinction between semantic identifiers (intelligible to people: designed for people and machines) or non-semantic identifiers (not meaningful to people; designed for machines).

Naming And Identifying Digital Objects (George Kozak)

George reviewed the current methods used for naming and identifying digital objects in a number of CUL collections. He noted that identifiers are needed for files, metadata, images, and full-text. An important question to answer is what kind of digital object is being described. Is the baseline object a journal, an issue, or an article? Also, do we want the identifiers to be random, sequential, or derivative? "Smart" identifiers tell you something about the file simply from the name. George gave overviews of the different identification methods used by Making of America, Math Books, International Women's Periodicals, Euclid, the Ezra Cornell Papers, and the May Anti-slavery Pamphlet Collection. All of these collections present a difficulty in handling hierarchical information and relationships among images, OCR, and metadata files. George indicated that Euclid was most like a URN scheme -- articles are given identification numbers that are not smart in any way and rely on a file for resolving information. George said that each approach has been system specific and that there is a sense that a global approach would be valuable.

Making and Identifying Digital Objects: The CUGIR 2.0 Approach (Elaine Westbrooks)

Elaine reviewed some limitations of the initial implementation of naming schemes and identifiers in the Cornell University Geospatial Information Repository. These stem from "smart" file naming conventions that were not limber enough to adapt to the asymmetrical growth of the CUGIR collection. A particularly difficult issue was dealing with multiple versions of the same data. A CUL internal grant allowed rethinking of access issues to create MARC records. The system developed created "buckets" to access all versions of the files at once. These buckets rely on URLs and a code that refers to a metadata record rather than a geospatial data file. The URLs could easily be adapted to serve as URNs.

ARK Persistent Identifier Scheme (Adam Chandler)

Adam described a very recent suggestion by John Kunze called the "Archival Resource Key." Kunze argues that persistence of objects is purely a matter of service and not a function of an object or naming scheme. He proposes three requirements for ARK: a link of an object to a promise of stewardship, a link from an object to a description of it (based on an identifier from an institution naming authority), and a link to the object itself or a copy of it. The identifier is an associated string and an information resource that uses to a record to bind an identifier string to a set of identifying resource characteristics.