LinkedIn Ordered to Allow Scraping of Public Profile Data

A United States federal judge has ruled that Microsoft's LinkedIn cannot block third party web scrapers from scraping data from publicly available profiles. The ruling, published on August 14, follows a lawsuit filed by startup hiQ Labs against LinkedIn, after LinkedIn issued a cease and desist letter to prevent the startup from scraping data.

HiQ Labs scrapes information publicly available on users' LinkedIn profiles to help companies determine whether employees are likely to leave their jobs. This type of scraping violates LinkedIn's prohibition against scraping software, and on May 23, 2017, LinkedIn sent hiQ Labs a letter demanding that the company cease scraping activities and threatened legal action under the Computer Fraud and Abuse Act (CFAA). HiQ Labs sued LinkedIn, accusing the company of anticompetitive behavior and of violating the company's free speech right to access publicly available information. The startup's attorney stated that hiQ Labs would likely go under without access its primary data source. In his ruling, Judge Edward Chen specifically called out LinkedIn's "broad interpretation" of the CFAA, which, "if adopted, could profoundly impact open access to the Internet, a result that Congress could not have intended when it enacted the CFAA over three decades ago." LinkedIn will reportedly appeal the ruling.

The federal order has serious implications for data ownership and privacy, including the amount of control social media companies have over information their users make public. HiQ Labs' argument that LinkedIn's limitation of access to public data violates the First Amendment builds on a recent Supreme Court ruling that equates social media sites to "the modern public square." As the debate in a related Hacker News thread highlights, it remains to be seen whether social media users themselves view the data they post publicly to be the equivalent of posting that information to the public square.

One unexpected dimension of data privacy in this court case is that LinkedIn argued that it wanted to protect not necessarily the data itself, but access to changes to the data. LinkedIn allows users to make their profiles public while at the same time opting out of sharing certain changes to their profile. However, HiQ Labs is able to detect changes through its mass scraping and use those findings to alert employers of potential employee attrition. While many users may understand the high-level implications of publishing their profiles publicly, most users may not always consider what insights that data can yield - and how it can be used - when unknown companies are continuously watching for updates.

David Berlind, editor in chief of Programmable Web, has recently written of the ruling's implications for the API economy. He argues that the value of LinkedIn data is not just the data itself, but the data model behind it, and that allowing bots to make use of this data organization without limits undermines the entire value of a product like LinkedIn. Furthermore, he argues, the ruling forces companies to allow scrapers to circumvent their published APIs, preventing a company from "scaling and understanding the connection between [its] data and the value it's driving."

While LinkedIn does publish APIs, it appears that many developers have not found them suitable for their needs due to the widespread evidence of LinkedIn scraping across the programming world. Open source scraping libraries are available on Github, developers converse about the topic on Stack Exchange and Quora, and commercial data scraping companies provide tutorials on collecting LinkedIn data. In 2016, The Microsoft-owned company initiated a lawsuit against 100 unnamed bot users for scraping data, although that case involved bots that sought access to non-public profile data through fake user accounts. Notably, LinkedIn is comfortable with scraping by whitelisted service providers such as search engines.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Data topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter