In our last in-person class before going entirely online due to COVID-19, the LDDC welcomed Matthew Stubenberg to our seminar on March 11, 2020. Matthew demonstrated his CLUE system and opened our eyes to ways data scraping can help improve criminal justice.
Now the Associate Director of Legal Technology at Harvard Law School’s A2J Lab, Matthew previously worked at Maryland Volunteer Lawyers Service (MVLS) where he developed the CLUE database. CLUE is a service that scrapes Maryland Case Search and creates a fully searchable SQL relational database. In short, it is a very powerful tool.
In the years since Matthew launched the database, researchers and advocates have used CLUE data to promote criminal justice. Examples of data–driven projects that Matthew discussed include the creation of targeted mailing to promote expungement, the discovery of habitual inadequate process servers, and the measurement of the effect of recent bail reform.
None of these criminal justice advances would be possible without scraping. Although scrapers are common, there are legal limits on their use. After Matthew’s visit, we decided to look at the law surrounding the use of scrapers.
A recent development in this area comes from the Ninth Circuit Court of Appeals, in a case called hiQ Labs v. LinkedIn. In hiQ, the court upheld hiQ’s ability to scrape LinkedIn for publicly available data to fuel hiQ’s employee recruitment platforms. The general issue with scrapers is that they collect data directly from the source in an alarmingly efficient manner.
Although the precedential value of the Ninth Circuit’s analysis is limited because the court was deciding a preliminary matter, it nonetheless provides beneficial guidance to those using scrapers. For now, the use of scrapers would appear to be lawful as long as the data that is being scraped is public.
While some apprehension regarding the use of scrapers makes sense, I think the Ninth Circuit struck the right balance in hiQ. Scrapers like the one in hiQ merely take data that is publicly available to you and me through our web browser. They just take that data at lightning speed.
Speedily accessing and indexing scraped data can benefit the public in profound ways. At the LDDC, Matthew Stubenberg and others have shown us how data collected by CLUE, when properly wrangled and analyzed, provides access to justice that otherwise would not have been available to Marylanders.