South Jersey Digital

South Jersey Center for Digital Humanities @Stockton College: The Blog

Live Blogging DH09 — Interlude

Posted by John Theibault on June 25th, 2009

CenterNet is holding their general meeting with special guest appearance by Jon Orwant of Google Books.

They anticipated a big crowd for this. The tables that have been in all the rooms have been removed and replaced by theater style seating and most of the seats are filling up.

Announcement that CenterNet will be holding an international summit of digital humanities centers immediately before next year’s DH10 in London. CenterNet will also be meeting with Consortium of Humanities Centers.

Orwant’s title: “What to search for in Google’s 7 million books?” First, why did they do it? Second, where are they going with it? Company motto: “organize the world’s knowledge and make it useful.” First discusses whether current strategy constitutes fair use and status of settlement. There will be opt in and opt out possibilities. For opt ins, Google may be able to arrange sale of digital copies, with interesting questions of pricing. Could also offer subscription service of those books that opt in. Overall about 125 million “works” and 165 million “manifestations” in the world. About 5% is in print. About 75% is out of print but under copyright. The rest are public domain. Creation of a “research corpus” is part of settlement. What that “research corpus” is is still to be defined. He was asked about quality of data from scanned pages (OCR). Says that it is as good as anyone’s, which is not all that great. OCR will never be 100%. What “making research corpus available” means is also still to be defined. Finding the right balance of using tools of digital humanists to do more advanced searches on Google’s API and making information useful for non-tech-savvy humanists. Orwant gives example that if source is too open, a non-tech-savvy user could ask a question that would require decades for the search engine to answer, so have to shield against that, but that makes it hard to integrate user generated tools. Mentioned “contest” to come up with “new uses” for Google Books. Will Google Books metadata be made available? No, because some of it is acquired through proprietary software. What about corrupt metadata? His response is whether this is an urgent issue or not, because it may be overcome by other developments, so hand recoding may not be cost-effective.

Leave a Reply

You must be logged in to post a comment.