You may have heard warnings to be careful what you post on the Internet, because the Internet is forever. But is it really? The BBC recently looked into the question of why there’s so little left of the early internet and found that even today there is too much data being put online for anyone to archive and preserve all of it. While some people may have good reason to worry whether something might live forever online, other people and organizations are worried about what is being lost. The BBC article highlights several examples of large web sites losing or deleting older content. In some cases, the only way to see even a portion of the lost content is on places that archive the web. One of the earliest attempts to preserve web pages came from Brewster Kahle, who founded the Internet Archive. The Internet Archive’s Way Back Machine provides a view some of the older content. Other organizations, such as the Library of Congress’ web archiving program, archive selected websites.
Deciding what to preserve and actually preserving it is an age old problem. A recent discovery of a what is essentially 16th century annotated bibliography of over 15,000 books obtained by a son of Christopher Columbus highlights how ephemeral information can be. About 75% of the books on the bibliography have been lost to history. The newly discovered annotations should be invaluable in that they can give historians a better glimpse into what types of books were being read 500 years ago. In the future, the early history of the internet may only consist of what a handful of people and organizations chose to preserve. If you are lucky, that won’t include the embarrassing photos that your parents posted of you on Facebook.