Internet Archive Data Breach

The Internet Archive has suffered a major data breach impacting millions of users. The non-profit online library is known for its Wayback Machine, which preserves snapshots of websites and digital content – including more than 916 billion web pages saved over time – and lawsuits regarding copyright.
The stolen data contains 31 million unique email addresses, and also exposes usernames and bcrypt-hashed passwords from their user authentication database. It was stolen in late September 2024, although news of the breach first surfaced on October 9, when an alert appeared on the Internet Archive’s website.
Security expert Troy Hunt, who runs the widely-used Have I Been Pwned (HIBP) free compromise checker service, has confirmed the stolen data’s authenticity. He plans to add the compromised data to HIBP so users can check if their information was exposed in this breach.
In addition, the Internet Archive has been dealing with ongoing distributed denial-of-service (DDoS) attacks, which have been claimed by the hacktivist group BlackMeta. These attacks have intermittently disrupted services on the archive’s websites and tools, although they are not believed to be linked to the data breach.
In response, the Internet Archive has disabled the compromised part of its platform used in the attack and scrubbed its systems to remove malicious traffic. It is currently unavailable to users, although a read-only version of its Wayback Machine is accessible again.
The reasons behind this attack on a nonprofit organisation dedicated to providing free access to information and preserving digital content are still unclear. While the Internet Archive is widely respected for its mission, it has faced criticism and lawsuits from content creators, particularly over copyright concerns. However, it’s possible that this attack was politically motivated or even state-sponsored. Given the broad access the Internet Archive (and its Wayback Machine) provides to a vast range of resources, the responsible group may have been protesting against certain content — perhaps for religious or ideological reasons — or attempting to block access to information for tactical or control purposes.
The Internet Archive has been posting updates on X, and seems quietly confident that a return to full operations will be possible and happen relatively soon.