Skip to Navigation
University of Pittsburgh
Print This Page Print this pages

July 20, 2006

More to be digitized with new ULS scanner

From a windowless room on the third floor of an office building in Point Breeze, a handful of Pitt employees are poised to make their digital mark on human history.

Armed with stacks of antique books, a scanner and library administrators’ willingness to share, Pitt’s Digital Research Library is a partner in a project launched in 2005 by Yahoo! and the Internet Archive to build a searchable digital collection of the world’s books and multimedia content.

Pitt’s contribution to the Open Content Alliance (www.opencontentalliance.org) will start with the Darlington Collection, known as one of the University’s richest sources of information on western Pennsylvania and Ohio Valley history. The books themselves, housed in the Darlington Library in the Cathedral of Learning, won’t be going anywhere, but the University Library System’s acquisition of a new scanner means the Darlington’s historic content will have a broader audience than ever before.

“We have pledged to contribute digital books on Americana,” said ULS director Rush G. Miller, noting that the content of some 600 works already is available on line as part of the Historic Pittsburgh collection (a collaboration among Pitt, the Historical Society of Western Pennsylvania and the Carnegie Museum of Art).

The Open Content Alliance plans to launch a digital collection of Americana on line later this year, Miller said. “This is exactly what they’re looking for,” Miller said of the Darlington’s books and maps. “It’s a marvelous collection of early colonial history. This is a treasure that’s a collection of the University that isn’t well known,” he said, adding that digitizing will allow historians and researchers to discover the collection’s thousands of books.

“When we put something like this on line, the use is 100 times more,” he said, adding that his goal is to have the project completed within two years.

The Digital Research Library already has placed a number of materials on line including former Pennsylvania Gov. Dick Thornburgh’s papers, a collection of 19th-century schoolbooks and more than 40 collections of images, including detailed photos of Chartres Cathedral.

To facilitate the Open Content Alliance project and increase the University’s capacity to digitize more of its holdings, ULS has purchased a $100,000-plus high-speed scanner. In addition to eliminating constraints on the size and type of works that can be scanned, speed also will be increased. Manufacturer’s estimates peg the scanner’s speed at up to 500 pages per hour. “If we could get 200 pages an hour, we’d be happy,” Miller said.

“What we want to do is build a fairly robust internal capability to be able to tackle major projects,” he said.

Digital Research Library ccordinator Ed Galloway, three librarians and two scanner technicians now are working out the technical details of how to make that happen.

With much excitement but little fanfare, the new scanner arrived in late June and was installed at the Digital Research Library, located in the University’s Thomas Boulevard facility in Point Breeze.

The size of a large desk, the scanner can hold oversized or fragile books, which previously either had to be scanned by outside vendors or could not be scanned at all.

To an outsider, the DigiBook SupraScan, manufactured by the French firm i2S, looks like a simple combination of a camera, light source and book holder. To the librarians, the machine represents the ability to broaden the range of the digitizing that can be done in-house.

“At some point you realize we’re limited on what we can do because we have to rely on vendors,” Galloway said. For example, the Digital Research Library couldn’t scan maps, large books or one-of-a-kind items.

A cartload of about 100 books from the Darlington Library, known simply as “batch 1,” has been brought to the Digital Research Library for their moment in the spotlight. They come in a range of sizes, widths, ages and levels of fragility, but all have one thing in common: They predate 1923 — a magic year for those wanting a quick assurance that there’s no risk of copyright violations. Pre-1923 books are all in the public domain, Galloway explained.

Scanner technician Ted Tarka demonstrated the ease with which the machine works. He deftly placed a book in the cradle and closed the glass that holds it in place. Moving to a nearby computer monitor, he adjusted for margins and quality, pushed a button and the pages were scanned and saved in digital form.

With the new scanner, the Digital Research Library no longer will have to rely on outside firms. “We desired to control our own destiny and gain the ability to scan large book collections,” Galloway said, explaining that until now, books could not be scanned unless they were taken apart to place on flatbed scanners. In the past, to digitize books, “We found duplicate copies and disbound them,” he said. Needless to say, given the value of the Darlington works, “It’s no longer in our best interest to disbind the books,” Galloway said.

The staff already are taking a second look at materials they had to pass up in earlier projects. “We’re going back to find things we couldn’t do before,” Galloway said.

But it’s not as simple as diving in and beginning to copy each page. Decisions need to be made, esthetic considerations determined.

“We’re trying to replicate the feel of a book that users are used to having,” said Digital Research Library librarian Michael Bolam. That means deciding whether to scan in color, grayscale or simply black-and-white; correcting for the curvature of the books. and ensuring that the on-line quality is as close as possible to the book itself, all the way down to the markings and discolorations one might expect to see in antique books.

Technical aspects also need to be considered: Higher quality scans take longer, as do color scans. At the highest resolution, the Darlington digital collection could total some 48 terabytes (48,000 gigabytes) or, at a lower resolution, as little as 300 gigabytes.

And there’s post-production to consider.

Having the works in digital form is useless unless the information is accessible. “There’s the whole other process of making it searchable,” Galloway said. “That’s all done here.”

Acknowledging the Digital Research Library’s current state of rapid growth, “This will give us the capacity to do more,” Galloway said.

—Kimberly K. Barlow


Leave a Reply