•  
  •  
 
Washington Journal of Law, Technology & Arts

Abstract

Web crawlers are widely used software programs designed to automatically search the online universe to find and collect information. The data that crawlers provide help make sense of the vast and often chaotic nature of the Web. Crawlers find websites and content that power search engines and online marketplaces. As people and organizations put an ever-increasing amount of information online, tech companies and researchers deploy more advanced algorithms that feed on that data. Even governments and law enforcement now use crawlers to carry out their missions. Despite the ubiquity of crawlers, their use is ambiguously regulated largely by online social norms whereby webpage headers signal whether automated “robots” are welcome to crawl their sites. As courts take on the issues raised by web crawlers, user privacy hangs in the balance. In August 2017, the Northern District of California granted a preliminary injunction in such a case, deciding that LinkedIn’s website must be open to such crawlers. In March 2018, the District Court for the District of Columbia granted standing for an as-applied challenge to the Computer Fraud and Abuse Act to a group of academic researchers and a news organization. The Court allowed them to proceed with a case in which they now allege the law’s making a violation of website Terms of Service a crime effectively prohibits web crawling and infringes on their First Amendment Rights. In addition, news media is inundated with stories like Cambridge Analytica wherein web crawlers were used to scrape data from millions of Facebook accounts for political purposes. This paper discusses the history of web crawlers in courts as well as the uses of such programs by a wide array of actors. It addresses ethical and legal issues surrounding the crawling and scraping of data posted online for uses not intended by the original poster or by the website on which the information is hosted. The article further suggests that stronger rules are necessary to protect the users’ initial expectations about how their data would be used, as well as their privacy.

First Page

275

Included in

Computer Law Commons

Share

COinS