What is the Internet Archive?

Launched in 1996, the Internet Archive is a non-profit organization with a stated mission of “universal access to all knowledge.” With this in mind, the organization provides free public access to digitized materials, including web pages, books, audio recordings, including live concerts, videos, images, and software programs. As of April 2021, it includes:

475 billion web pages28 million books and texts14 million audio recordings (including 220,000 live concerts)6 million videos (including 2 million Television News programs)3.5 million images580,000 software programs

Everything collected by the Internet Archive takes up more than 70 Petabytes of server space, including two copies of everything. The organization is funded through donations, grants, and fees from book digitization services.  For privacy, the Internet Archive doesn’t keep track of the IP addresses of its readers and uses the HTTPS (secure) protocol throughout.

The Internet Archive includes a search function front-and-center on the main page. You’ll also see a changing listing of the top collections. To find something, add your search term in the appropriate box, then click Go. You can also narrow your search to content and sections.

Sections

As you can see by the numbers above, there’s a lot of content available at archive.org. As the organization explains on its home page, it pays special attention to books because not everyone has access to public or academic libraries. The site also provides a growing selection of videos, including television content, including a TV news archive. Here’s a look at each section.

Web

No doubt, the Wayback Machine is the most popular section on the Internet Archive website. It offers a digital archive of the public side of the web. At the time of this writing, it has made digital copies of over 562 billion web pages. When visiting the site, you can search any website that has existed since the Wayback Machine went online in 2001. From there, you can find crawled web pages from the site over time. For example, a search for “GroovyPost.com” brings up 2,328 crawls going back to 2007. The Wayback Machine doesn’t include everything posted on a website on a given day since some content is restricted or stored in databases, which aren’t accessible. Because of this, some websites are better crawled than others, depending on how developers created a site at a time. You’ll also notice the newer the archive, the more content available for any given site. Regardless, going “back in time” is a treat and shows you just how much has changed in the past few decades as the web and the technology that maintains it has matured.

Books and Text

The text archive collection offers a massive amount of content that continues to grow each week, including 2.3 million modern eBooks that anyone can borrow with a free archive.org account. You can search for content through metadata or content, by media type, year, topic and subjects, and more. The main book section page also lists collections by views, title, date published, and creator. For over 15 years, the Internet Archive has collaborated and built digital collections with over 1,100 library institutions, such as the Boston Public Library, Library of Congress, and more. These partnerships have allowed digitizing various media types, including microfilm and microfiche, journals, and serial publications. English makes up a large majority of the books and text posted online. There’s also content in Dutch, German, French, Arabic, Italian, and more. Of its books and text collection, Internet Archive explains, “Because we are a library, we pay special attention to books. Not everyone has access to a public or academic library with a good collection, so to provide universal access, we need to provide digital versions of books. We began a program to digitize books in 2005, and today we scan 3,500 books per day in 18 locations around the world. Books published prior to 1926 are available for download, and hundreds of thousands of modern books can be borrowed through our Open Library site. Some of our digitized books are only available to people with print disabilities.”

Video

The video archive is organized in much the same way as the book collection. There are also special collections that are organized around an event, person, or organization. One special section includes fact-checked TV news clips by FactCheck.org, PolitiFact, The Washington Post’s Fact Checker, or other organizations.  It also features a downloadable table with fact checks organized by topic, date, and sources. One influential video collection is a news archive dedicated to September 11, 2001, and the events that followed. It includes archived news programs from the U.S. and abroad. Perhaps it’s the site’s ever-growing TV News archive that’s most fascinating. Here you can find visualizations and press information, recent fact-checks and quotes, trends, and additional special collections. Are you looking for a specific news program? It’s almost certainly located here, along with closed captioning, text, and a summary of the program’s topics. Better still, the search function makes it possible to find specific sections in the news program. The video collection isn’t just focused on news, however. You’ll also find animation and cartoons, sports videos, movies, spiritual programs, vlogs, and much more.

Audio

Impressive in its own right is the ever-expanding audio collection. You’ll find a live music archive along with podcasts, iconic radio programs, 78 RPMs, and a lot more. Special collections include those focused on The Grateful Dead, the LibriVox free audiobook collection, and others.

Software

We’ve come a long way since personal computing started taking off in the 1970s and 1980s. Along the way, we’ve seen technologies come and go. That’s where the Internet Archive software collection comes in. It features the largest vintage and historical software library globally and includes millions of programs, CD-ROM images, documentation, and multimedia. The software presented here includes shareware, freeware, video news releases about software titles, speed runs of gameplay, previews, and promos. The software special collections sound like a trip down memory lane, featuring MS-DOS, emulation, CD-ROM software, and more. Perhaps my favorite software selection is the Internet Arcade. It contains a web-based library of coin-operated arcade video games from the 1970s into the 1990s. Thanks to emulation, you can play any game via a web browser.

Images

Finally, there’s the image collection. From here, you’ll find everything from maps to astronomical imagery to photographs. Highlights include a broad collection of logos, cover art, content from the Metropolitan Museum of Art, NASA, and others.

Additional Sections

Larger Internet Archive sections are called Projects, and these are part of separate websites. The most well-known of these is the Wayback Machine, which features a web archive going back to 1996. In total, there are more than 200 million websites archived in 40 languages. Besides the Wayback Machine, the Internet Archive offers OpenLibrary.org. The site offers a free, digital lending library of over 2 million eBooks, which you can read online or offline. As part of its mission, the site is also dedicated to building a webpage for every book ever produced. To date, over 2 million books already have a page on OpenLibrary.org. The site is available on any browser, including Microsoft Edge, Google Chrome, and more. There is currently no Internet Archive app, although you can access the service on iOS and Android-based machines. Comment Name * Email *

Δ  Save my name and email and send me emails as new comments are made to this post.

What is the Internet Archive and What Can I Find on It  - 19What is the Internet Archive and What Can I Find on It  - 47What is the Internet Archive and What Can I Find on It  - 20What is the Internet Archive and What Can I Find on It  - 86What is the Internet Archive and What Can I Find on It  - 77What is the Internet Archive and What Can I Find on It  - 67