Nov 05 2007

The limits of digitization

Published by tom at 10:13 pm under Books

Interesting article in the New Yorker on the promise, and limitations, of efforts to scan, digitize and index all the print books in existence.

Google has been at it for several years, and the results of their work can be seen at Google Book Search. Google has two sources of books for its project: the partner program, and the Library Project. In the partner program Google collaborates with publishers (currently over 10 thousand worldwide) to provide users of ways to search for books currently covered under copyright. In the Library Project, Google is scanning and digitizing as many books as they can in collaboration with great libraries around the world, including the libraries of the University of Michigan and the New York Public Library. This effort is not without controversy (Daniel Brandt is no relation to me).

A rival project to Google’s is the Open Content Alliance, a non-profit venture which is also digitizing whole libraries for web access. This project, wary of the for-profit nature of the Google project, aims to place material on the web without the restrictions imposed by Google.

These projects will bring unprecedented access to an unimaginable number of books, and this is unambiguously good. But there are limitations. From the article:

And yet we will still need our libraries and archives. John Seely Brown and Paul Duguid have written of the so-called “social life of information”—the form in which you encounter a text can have a huge impact on how you use it. Original documents reward us for taking the trouble to find them by telling us things that no image can. Duguid describes watching a fellow-historian systematically sniff two-hundred-and-fifty-year-old letters in an archive. By detecting the smell of vinegar—which had been sprinkled, in the eighteenth century, on letters from towns struck by cholera, in the hope of disinfecting them—he could trace the history of disease outbreaks. Historians of the book—a new and growing tribe—read books as scouts read trails. Bindings, usually custom-made in the early centuries of printing, can tell you who owned them and what level of society they belonged to. Marginal annotations, which abounded in the centuries when readers usually went through books with pen in hand, identify the often surprising messages that individuals have found as they read. Many original writers and thinkers—Martin Luther, John Adams, Samuel Taylor Coleridge—have filled their books with notes that are indispensable to understanding their thought.

Furthermore, each of these projects has its own database, interface, and limits on the numbers of books it will digitize. Copyright adds another layer of restriction. Will we ever have a seamless, universal library containing the whole sum of human knowledge? No, of course not. But what we do and will have is wonderful.

(Updated 6 November - fixed some typos)

Trackback URI | Comments RSS

Leave a Reply