Everything dies: people, machines, civilizations. Perhaps we can find some solace in knowing that all the meaningful things we’ve learned along the way will survive. But even knowledge has a life span. Documents fade. Art goes missing. Entire libraries and collections can face quick and unexpected destruction.
Surely, we’re at a stage technologically where we might devise ways to make knowledge available and accessible forever. After all, the density of data storage is already incomprehensibly high. In the ever-growing museum of the internet, one can move smoothly from images from the James Webb Space Telescope through diagrams explaining Pythagoras’s philosophy on the music of the spheres to a YouTube tutorial on blues guitar soloing. What more could you want?
Quite a bit, according to the experts. For one thing, what we think is permanent isn’t. Digital storage systems can become unreadable in as little as three to five years. Librarians and archivists race to copy things over to newer formats. But entropy is always there, waiting in the wings. “Our professions and our people often try to extend the normal life span as far as possible through a variety of techniques, but it’s still holding back the tide,” says Joseph Janes, an associate professor at the University of Washington Information School.
To complicate matters, archivists are now grappling with an unprecedented deluge of information. In the past, materials were scarce and storage space limited. “Now we have the opposite problem,” Janes says. “Everything is being recorded all the time.”
In principle, that could right a historic wrong. For centuries, countless people didn’t have the right culture, gender, or socioeconomic class for their knowledge or work to be discovered, valued, or preserved. But the massive scale of the digital world now presents a unique challenge. According to an estimate last year from the market research firm IDC, the amount of data that companies, governments, and individuals create in the next few years will be twice the total of all the digital data generated previously since the start of the computing age.
Entire schools within some universities are laboring to find better approaches to saving the data under their umbrella. The Data and Service Center for Humanities at the University of Basel, for example, has been developing a software platform called Knora to not just archive the many types of data from humanities work but ensure that people in the future can read and use them. And yet the process is fraught.
“We can’t save everything … but that’s no reason to not do what we can.”
“You make educated guesses and hope for the best, but there are data sets that are lost because nobody knew they’d be useful,” says Andrea Ogier, assistant dean and director of data services at the University Libraries of Virginia Tech.
There are never enough people or money to do all the necessary work—and formats are changing and multiplying all the time. “How do we best allocate resources to preserve things? Because budgets are only so large,” Janes says. “In some cases, that means stuff gets saved or stored but just sits there, uncatalogued and unprocessed, and thus next to impossible to find or access.” In some cases, archivists ultimately turn away new collections.
The formats used to store data are themselves impermanent. NASA socked away 170 or so tapes of data on lunar dust, collected during the Apollo era. When researchers set out to use the tapes in the mid-2000s, they couldn’t find anyone with the 1960s-era IBM 729 Mark 5 machine needed to read them. With help, the team ultimately tracked down one in rough shape at the warehouse of the Australian Computer Museum. Volunteers helped refurbish the machine.
Software also has a shelf life. Ogier recalls trying to examine an old Quattro Pro spreadsheet file only to find there was no readily available software that could read it.
There have been attempts to future-proof programs. One project that got a lot of fanfare in 2015 is the Open Library of Images for Virtualized Execution (Olive) archive, which runs old software like Chaste 3.1, a 2013 biology and physiology research program, and the 1990 Mac version of the computer game The Oregon Trail on a set of virtual machines. The project is still active, says Mahadev Satyanarayanan, a professor of computer science at Carnegie Mellon University. But there have been challenges in expanding Olive’s offerings, he says: even unused software has to be licensed from the companies that own it, and there is often no easy way to enter new data into the archive’s research applications.
Other efforts to help advance the longevity of knowledge have also had mixed results. The Internet Archive, home of the Wayback Machine, has a large collection of digitized materials, including software, music, and videos; as of the summer of 2022 it was fighting a copyright infringement lawsuit brought by multiple publishers.
On the more hopeful side, the Text Encoding Initiative has maintained international standards for encoding machine-readable texts since the 1990s. A decade ago, the US Office of Science and Technology Policy stipulated that applications for federally supported research have to provide a data management plan so the data can be used by researchers or the public in the future. “We’re getting to the point where almost every grant-funded research project has to put its data somewhere,” Ogier says. But there are no overarching requirements about who must store the data or how long it must be saved.
Unavoidably, ideas, knowledge, and human creations will continue to be lost. “We can’t save everything. We can’t provide access to everything. We can’t retrieve everything,” Ogier says. “But that’s no reason to not do what we can.”
Erik Sherman is a freelance journalist based in Ashfield, Mass.