More than 12 million historic copyright-free images will soon be available to anybody with an internet connection, thanks to big data evangelist Kalev Leetaru.
Leetaru, a scholar at Georgetown University, has already uploaded nearly 3 million of the images to Flickr, which have been sourced from 600 million library book pages and scanned by the Internet Archive organization — and, they are fully searchable because of the tags that have been automatically added.
The books, up to this point, have been treated only as text and the images have largely been ignored, which is a shame because so many of the originals, dating back to 1500, have been lost or severely damaged, says Kalev.
"For all these years all the libraries have been digitising their books, but they have been putting them up as PDFs or text searchable works," Kalev told the BBC. "They have been focusing on the books as a collection of words. This inverts that."
As a Yahoo! fellow at Georgetown, Leetaru wrote his own software to customize how books would be digitized during his project. The Internet Archive has used a program that discarded images, but Leetaru reengineered the software to go back and salvage what the original scans had discarded, leaving him with the images that would then be converted into a Jpeg format.
Leetaru plans to make his code available to others and that any library could replicate what he has accomplished. "That's actually my hope, that libraries around the world run this same process of their digitized books to constantly expand this universe of images," he said.