Professor Liddy, of the School of Information Studies at Syracuse University, and director of the Center for Natural Language Processing, began her discussion by defining the term "digital library" with the 1st known definition by Michael Lesk: Digital libraries are based on the principles and practices of librarians. Library science emerged from the combination of acquiring and organizing information. Computer science emerged with the Digital representation of information and information retrieval.
Metadata in the library community began with the standardization of MARC and the introduction of SGML and Dublin Core.
Liddy's work with the NSDL is essentially a metadata retrieval project whose purpose is to break the metadata generation bottleneck, standardize connection, and test the Metadata from the NSDL. The Project partners in this project are:
The goal of number 1, Metadata Generation Bottleneck, is is to demonstrate the feasibility of high-quality automatically-generated metadata for digital libraries through Natural Language Processing. Metadata for the NSDL uses the 15 elements of Dublin Core plus 8 from GEM: (Gateway to Education Metadata) Pedagogy, standards, quality, cataloging, audience, duration, and essential resources.
Natural language processing is a division of Artificial intelligence and is being used as a method of information extraction. NLP is a technology that enables systems to accomplish human like understanding of document contents by extracting implicit and explicit information.
Sub-language analysis, which is a component of NLP utilizes domain and genre-specific regularities vs. full-fledged linguistic analysis as well as discourse model development. The transformation based learning is equated with linguistic based machine learning.
Discourse Model Development, a component of NLP extracts information specialized for communication goals of document type and activities under discussion. NLP works on 2 types of features, non-linguistic (document length) and linguistic (Semantics, Syntax, Morphology, parts of speech, categories, root forms of verbs, discourse level components).
Liddy demonstrated natural language processing being applied to a lesson plan where a sentence was input and the morphology and lexical analysis was completed. The first step of processing includes :
By using Blind test to compare automatic and manually created metadata. Liddy's group found that the difference was only 26% which is borderline significant/insignificant.
For Standard Connection Information (number 2 above), it is best to view the powerpoint presentation to see the graphs and diagrams (slides 28-38)
The goal of the third and final stage, Metadata Testing, is to Measure quality and usefulness of metadata. This involved evaluating information seeking behaviors: How do users search & browse the digital library? Do search attempts reflect the available metadata? Which metadata elements are the most important to users? and What metadata elements are used most consistently with the best results? In addition human computer interaction was tested with Eye-tracking with Think-aloud Protocols, individual subject data.
Liddy's research led her to conclude that NLP can: