Is Archive.org legal? Is it legal to archive without the author’s approval?

Based on the facts of what happens today with data on the internet then we can not say that archive.org is illegal. However it would be better for the owner of a website to be warned about archiving first.

We all know: What goes online remains online by many others who do not ask for any rules.

Legal or ilegal, Archive.org helps publishers to restore their website data even after a few years.

Archive.org is:

  • A wonderful opportunity to recover website data even after a few years.
  • A golden opportunity for those who buy expired domains (these do not get tired of content because content is available to an WayBack Machine).

A wonderful opportunity to recover website data.

During the work the mistakes happen, the data can be damaged or completely deleted. This makes WayBackMachine a good opportunity to recover website data. But this also depends on the date when the website was archived.

A golden opportunity for those who buy expired domain, or more precisely domain, content abusers.

I’ve had a domain that has expired because of my inability to renew it. I’ve saved a backup of that expired website to use in a new domain.

But after some time someone else has reactivated that domain. I was surprised when I saw my content on the website that I had no control. And … oh god … duplicate content. It is logical that Google search found that duplicate content was caused by my new site, even though that content I created was unusual. An abuser wich found my old content on WaybackMachine has managed to run very higher then me on search engines with content I’ve created, while my new site is ranked at the end of the results, or nowhere.

 

Could WaybackMachine (archive.org) be better? Can we avoid abuse by others?

Of course yes. Abuses are not committed by WaybackMachine but by third parties who abuse domain and content, and here it would be better to think of a solution.

Looks like a non-profit organization (.org). It simply archives all published data, and may not offer many options.

Let’s understand one thing. WaybackMachine is not only used by the website to retrieve data, but also by some other services. Cloudflare uses WaybackMachine for the “Always online” option.

So it would be better to be warned by Domain DNS Manager about archiving, whether we want to allow it or not. This is because beginners have no idea about other methods such as robots.txt.

 

You can stop archiving if this is a problem for you.

In this case the best option is prevention. The first thing to do is put a code in robots.txt a file placed in the root dir of the website.

Enter this code:

User-agent: ia_archiver
Disallow: /

WaybackMachine will accept this rule and will not archive data from that domain.

But if your content is already archived you should immediately put the previous code in robots.txt and you should contact the archive.org service to delete the archived content. This process will take longer because archive.org needs to verify that you are the owner of that website.