Everything dies, including information
Quite a lot, according to experts. One thing is for certain, what we believe is permanent isn’t. Digital storage systems can become unreadable within three to five years. Archivists and librarians are in constant race to convert data to newer formats. Entropy is always lurking in the wings. Joseph Janes, associate professor at the University of Washington Information School, says that although our professions and our people try to extend the average life span as much as possible using a variety of methods, it’s still holding back.
To make matters worse, archivists now face an unprecedented flood of information. Materials were scarce in the past and storage space was limited. Janes states, “Now we have an opposite problem.” “Everything is being recorded .”
In principle this could correct a historical wrong. Many people didn’t have the right cultural, gender, and socioeconomic background to enable their knowledge and work to be valued or preserved for centuries. The digital world’s massive scale presents a unique challenge. According to IDC, the amount that individuals, companies, and governments will create over the next few years is likely to be twice as much as all the digital data created before the advent of the internet.
Entire universities are working to find better ways to save data. For example, the Data and Service Center for Humanities at University of Basel has been working on a software platform called Knora that will not only archive many types of data related to humanities work, but also ensure that future generations can access them. Yet, the process is complicated.
“We can’t save everything … but that’s no reason to not do what we can.”
“You make educated guesses and hope for the best, but there are data sets that are lost because nobody knew they’d be useful,” says Andrea Ogier, assistant dean and director of data services at the University Libraries of Virginia Tech.
There are always enough people and money to do all the work. Formats are constantly changing and multiplying. How can we allocate resources so that things are preserved? Janes says budgets are limited. Janes says that in some cases, this means that stuff is saved or stored, but it just sits there, uncatalogued or unprocessed, making it nearly impossible to find or access. In some cases, archivists may even turn down new collections. The formats used to store data can be lost. NASA socked away 170 or so tapes of data on lunar dust, collected during the Apollo era. When researchers set out to use the tapes in the mid-2000s, they couldn’t find anyone with the 1960s-era IBM 729 Mark 5 machine needed to read them. The team was able to locate one in rough condition at the Australian Computer Museum warehouse. Volunteers assisted in the refurbishment of the machine.
Software also has a shelf-life. Ogier recalls trying out an old QuattroPro spreadsheet file and finding no software that could read it.
There have been attempts to future-proof programs. One project that got a lot of fanfare in 2015 is the Open Library of Images for Virtualized Execution (Olive) archive, which runs old software like Chaste 3.1, a 2013 biology and physiology research program, and the 1990 Mac version of the computer game The Oregon Trail on a set of virtual machines. Mahadev Satyanarayanan is a Carnegie Mellon University professor of computer science. He says that the project is still active. However, Olive’s offerings have faced challenges. Even unused software must be licensed from the companies that make it. And there is often no way to add new data to the archive’s research applications.
Other efforts to preserve knowledge’s longevity have also been unsuccessful. The Internet Archive, home of the Wayback Machine, has a large collection of digitized materials, including software, music, and videos; as of the summer of 2022 it was fighting a copyright infringement lawsuit brought by multiple publishers.
On the more hopeful side, the Text Encoding Initiative has maintained international standards for encoding machine-readable texts since the 1990s. A decade ago, the US Office of Science and Technology Policy stipulated that applications for federally supported research have to provide a data management plan so the data can be used by researchers or the public in the future. Ogier states that almost every grant-funded research project must store its data. There are no requirements for how long the data must be kept or who it should be stored.
Ideas, knowledge, and human creations are bound to continue to be lost. “We cannot save everything. We cannot provide access to all. Ogier states that we can’t retrieve all the information. “But that’s no reason to not do what we can.”
Erik Sherman is a freelance journalist based in Ashfield, Mass.
I’m a journalist who specializes in investigative reporting and writing. I have written for the New York Times and other publications.