150 Years of Semantic Technology

Evan Sandhaus
November 13, 2009, 2:30-4:00pm, Mann Library, Room 102

Description

The first semantic search system for The New Times was released in 1913 and was available bound in either paper ($6) or cloth ($8). In the 96 years since the advent of The Historical Index to The New York Times, semantic technology has become central to The New York Times' daily operations and the focus of much internal research and development. In this talk, and Evan Sandhaus, Semantic Technologist, will review the long history of semantic technology at The New York Times; discuss the application of this technology in our operations; and discuss both The New York Times Annotated Corpus and the recently announced Linked Open Data initiative: data.nytimes.com.

Evan Sandhaus has been the semantic technologist, for the research and development operations department of The New York Times Company since 2006. In this role, Mr. Sandhaus has developed a semantic technology for identifying key concepts in large text datasets; engineered a patent-pending system for purging template text from Web content; and collaborated with The Linguistic Data Consortium to release and promote The New York Times Annotated Corpus, a collection of 1.8 million richly annotated New York Times articles published from 1987 to 2007. Additionally, Mr. Sandhaus has led the development of a Web-scale web crawler, a Google Earth news layer and multiple search engine optimization toolkits. Before joining the Times Company, Mr. Sandhaus worked at The University of Pennsylvania from 2005 to 2006 and Lockheed Martin from 2002 to 2005. Mr. Sandhaus holds a bachelor's degree from Williams College and master's degree from Villanova University, both in Computer Science.