The fruits of web scraping – using code to take data and information from websites – are all around us.
People can build scrapers that can Search on every Applebee planet or Collect congressional laws and votes or Track fancy watches for sale On fan websites. Businesses use scrapers Manage their online retail inventory And monitor Competitor prices. Lots of famous sites use scrapers to do things Keep an eye on airline ticket prices And Job listings. Google is essentially a huge, crawling web scraper.
Scraps are also instruments of watchdogs and journalists, which is why The Markup has created a Amicus brief This week in a case before the US Supreme Court that threatens to make scraping illegal.
The case itselfVan Buuren v. United States– This is not about scraping, but rather a legal question regarding the prosecution of a Georgia police officer, Nathan Van Buren, who was bribed to look for confidential information in a law enforcement database. Van Buuren was prosecuted under the Computer Fraud and Abuse Act (CFAA), which prohibits unauthorized access to computer networks such as computer hacking, where a person breaks into a system to steal information (or, as That was dramatized in the classic film of the 1980s)War games, “Potentially starting World War II).
In Van Buren’s case, since he was given permission to use the database for work, the question is whether the court would broadly define his troubled activities as “exceeding authorized access” to extract the data , Which would make it an offense under the CFAA. And this is the definition that can affect journalists.
Or, as Justice Neil Gorsuch said during Monday’s oral argument, “perhaps we are federalizing for all”.
Investigative journalists and other watchdogs often use scrapers to illuminate issues big and small. Tracking the influence of lobbyists in Peru By cutting digital visitor logs for government buildings Monitoring and archiving Political advertising on Facebook. In both of those examples, scraped pages and data are publicly available on the Internet – no hacking required – but the sites involved can easily change the fine print on the terms of service to label the aggregation of that information “Unauthorized.” And the US Supreme Court, depending on how it rules, can decide that violating the terms of service is an offense under the CFAA.
Markup wrote in our brief, “a statute that allows powerful forces such as the government or wealthy corporate actors to unilaterally criminalize news activities by blocking these efforts through terms of service for their websites, “
What kind of work is at risk? Here’s a roundup of some journalism made possible by web scraping recently:
- COVID Tracking ProjectFrom the Atlantic, collects and collects data from around the country on a daily basis, serving as a means of monitoring where testing is taking place, where epidemics are on the rise, and racial disparities that contract and die from the virus Huh.
- this Assignment or Project, From Reveal, removed extremist Facebook groups and compared their membership rolls to members of law enforcement groups on Facebook – and found a lot of overlap.
- A recent check of markup in Google’s search results found that it is persistent Favors their products, Except for a few websites from which web giants themselves scrape information struggling for visitors and, therefore, advertising revenue. US Department of Justice Issue cited In an adversary lawsuit against the company.
- In Copy, paste, legislation, USA Today found a pattern of cookie-cutter laws governed by special interest groups roaming the assemblies around the country.
This article was Originally published on The Markup And was republished under Creative Commons Attribution-NonCommercial-NoDerives License.