The Wayback Machine and the DMCA

by Lee Gesmer on July 14, 2005

Copyright, Digital Millennium Copyright Act. Quick now, what’s a good legal strategy when you’re involved in a bitterly contested trade secret, copyright and trademark case? Sue the lawyers on the other side, accusing them of hacking, of course. At worst, you’ll distract them and knock them off their game; at best, you’ll force their disqualification, pushing them out of the case and making your opponent go to the expense and inconvenience (not to be underestimated) of hiring new counsel and and getting them up to speed on the case.

And, it doesn’t matter that your suit may be borderline or even frivolous. Every experienced lawyer knows that in the American legal system the risks of being sanctioned for bringing a frivolous suit are only slightly higher than finding a hundred dollar bill on a Times Square sidewalk during lunch hour.

So, what happened here? First, there is an underlying trademark and trade secret suit between the similarly named “Healthcare Advocates” and “Health Advocate” that is of no particular interest to anyone except the parties. One of the issues is whether Healthcare published its alleged trade secrets on the Internet in the late 1990’s. Health Advocate, the defendant, is represented by the Harding Earley law firm, the lawyers who are at the receiving end of the lawsuit in question.

Seeking to investigate Healthcare’s publications on its Internet site in the late ’90’s, Harding Earley used the Internet Archive’s “Wayback Machine” to research Healthcare’s old web sites. The Wayback Machine (described in detail by Wikipedia here) is a remarkable Internet resource. Recognizing that old web pages disappear and that the early days of the Internet would be lost if they weren’t preserved, in 1996 the Wayback Machine began archiving the Internet for posterity, and thus far it has archived a petabyte of data – (the equivalent of 500 billion pages of standard printed text. (It’s also worth mentioning that the Internet Archive includes live music. The “live music archive” section of the site contains 2,886 live performances by the Grateful Dead. If you can’t wait to get started, click here).

If you own a web site and for some reason you don’t want to be archived by the Wayback Machine, you can opt out. The Wayback Machine permits website administrators to use the voluntary SRE (Standard for Robot Exclusion) to identify files or directories that cannot be “crawled” and indexed. Exclusion is accomplished by inserting a file called robots.txt on a web server. According to Internet.org this not only prospectively excludes a site from being crawled, but will “exclude any historical pages from the Wayback Machine.”

To return to our story, the case against Health Advocate was filed in 2003. Healthcare had been operating a website since 1998. In early July 2003, the robots.txt instructions were inserted by Healthcare. The next day (coincidence?) Harding Earley used the Wayback Machine to access Healthcare’s website material. Harding Earley hit the Healthcare site (using the Wayback Machine) very aggresively, and the robots.txt instruction failed to completely bar Harding Earley from accessing this material, nor did it warn the lawyers that the Healthcare material was “off limits.” The Harding Earley lawyers admit, however, that they had to “hit” the Healthcare URL repeatedly in order to gain access to it.

Fast forward two years, leaving out unimportant details. In early July 2005, based on the conduct described above, Healthcare filed suit against Harding Earley in the United States District Court for the Eastern District of Pennsylvania, Healthcare Advocates, Inc. v. Harding, Earley, Follmer & Frailey.

Most of Healthcare’s claims against the lawyers are hardly worth discussing (Trespass? Intrusion Upon Seclusion? Yeah, right…).

The only claim that might have merit is that Harding Earley violated the Digital Millenium Copyright Act (DMCA). Section 1201(a) of the DMCA states: “No person shall circumvent a technological measure that effectively controls access to a [copyright] work protected under this title.” Healthcare claims that robots.txt is a technological measure that controls access to the archived copies of its web site, and that the Harding Earley law firm circumvented that measure.

There are several problems with this argument. First, it’s not clear that Harding Earley took any affirmative steps to “circumvent” robots.txt, or that the law firm was even aware that Healthcare had attempted to restrict access to the material. In other words, Harding Earley did not use some form of de-encryption technology (common to DMCA claims). Did creating an automated script to hit the site repeatedly constitute circumvention under the DMCA?

Second, the DMCA defines a technological measure as one that “effectively protects a right of a copyright owner . . . if the measure, in the ordinary course of its operation, prevents, restricts, or otherwise limits the exercise of a right of the copyright owner under this title.” It’s not certain that the Standard for Robot Exclusion (implemented by robots.txt), a voluntary protocol, meets the DMCA’s definition of a technological measure. Moreover, the fact that the law firm got access simply by accessing the URL may suggest that the technological measure was not “effective.”

Third, since the archived website was accessed in connection with discovery in pending litigation, there may be no claim for copyright infringement. Absent the potential for copyright infringement, at least one case decided under the DMCA (Chamberlain v. Skylink) suggests that mere access to protected content may be insufficient to trigger the DMCA. As the Court noted in that case, the DMCA provides an additional avenue of protection for copyrightable content, but does not create a new property right.

Fourth, the lawyers can argue that it is significant that the web site content in question was fully and publicly accessible for five years. They may be able to argue that the the content was effectively licensed to the public or placed in the public domain when it was originally put on the web site, undermining the argument that the defendant’s access to it was in some way unauthorized.

Fifth, the plaintiff’s DMCA claim is clearly not what was contemplated by the law. The DMCA has traditionally (to the extent that term can be used for a statute of relatively recent vintage) been used to protected encrypted content, not external locks used to limit access to that content. Aggresive use of the DMCA in unanticipated ways is nothing new, but thus far has failed. Plaintiffs have unsuccessfully attempted to use the DMCA to prevent the sale of aftermarket printer cartridges and software codes embodied in garage door openers. We predict a similar conclusion is the likely outcome in this case.

Lastly, it’s worth noting that plaintiff’s theory, if successful, would discourage archivists like the Internet Archive from using voluntary measures like robots.txt upon pain of creating a DMCA violation if the measure fails, and therefore would open a host of copyright issues for the Archive that could endanger its existence.

Thanks to Joseph Laferrera, who has written a number of insightful articles on the DMCA (Court Limits Reach of DMCA – The Chamberlain Case and Court Preserves Aftermarket Competition Under the DMCA – The Lexmark Case), for his assistance in preparing this piece.

For some additional interesting commentary on this case by Rebecca Bolin, see LawMeme, here.

Previous post:

Next post: