IIUC that was for illegally downloading ebooks and other media -- it had nothing to do with training per se. Scraping publicly accessible data is generally legal, although Microsoft/LinkedIn clearly think they have enough of a leg to stand on to at least litigate this.
If they really want to put a dent into this, go after the biggest players scraping LinkedIn: PeopleDataLabs and Apollo.io (and no, taking down their company page does not count)
The dispute was settled because Pear agreed to slightly alter its logo, instead of continuing full litigation (maybe because of resources / dollars it would consume)
Even the legal filing and motions can help shape a case since they get rulings and such back. If a judge rejects a motion, maybe they need to approach it a different way when they go after big fish.
Only way this is not beneficial is if software company settle or gets dismissed right away.
If that’s going to happen with a small fish then it was certainly going to happen against a big fish. Cheaper, faster, and easier to attack a smaller business first. There is literally no reason to go after a big dog unless they did something particularly egregious and/or distinct that you can anchor your argument with. Unless your goal is just to waste their time and that of their lawyers I guess, though I think we would all assume the goal is to win ultimately.
Because they either have side deals with the big names, or they want to set precedent for going after them.
Not trying to be a conspiracy theorist here, but my bet is on having a deal with the big players, we allow you to scrape us (or we give you a pipe you can consume out of), and you pay us in monetary or non-monetary terms; like how many business exchanges work
Or, go after the small fish who can’t afford to have a biglaw team on retainer, bulldoze them to get a legal precedent set, and then use the example to extract concessions from the bigger players.
I've heard a lot of people cite this case as proof that scraping is legal, but it seems like the decision kept going back and forth in appeals, and I never understood what precedent it set, if any, around the legality of scraping.
Oh dear, my office has been scraping LinkedIn forever. We use it to make visual networks of contacts in our industry, and relate that to whom we have working for the company. oops.
Well maybe I can get that company to backup my LinkedIn posts because it is utterly broken to download anything about my profile to make a backup.
There is an API option but endpoints from documentation just return 404. There is Data Privacy "download my data" I wanted really data like my posts, photos not crappy CSV having basic properties. In the end there is "View the rich media" but also I have to click one by one and there is no text for posts on the images - I can do that going one by one of my posts and copy pasting. It sucks despite "your data belongs to you" texts on the labels.
These are my posts I have personal attachment to what I wrote.
Most of what I wrote I have in my notes anyway — but still if they say it is my data and I can always download it, I really want to download it and not like that someone just puts up lies on their website like "data is yours you can always download it".
They also make it difficult to destroy. Try deleting your post or comment history, and you can only do it slowly one by one, with only a few sketchy tools for making it faster that go against their terms of service.
I think most users don't want their data to be used by anyone and everyone. I sure don't. If one user needs access to their own data, they can always export it and take it where they please.
For most people the dangers of openness (see Cambridge Analytica), the lack of upside and the lack of security in small players mean that walled gardens are the best solution for the majority of people.
This lawsuit is exactly why people trust walled gardens to keep their data walled off. Because I trusted LinkedIn, not ProAPI and whatever malicious actors they sell to.
> This lawsuit is exactly why people trust walled gardens to keep their data walled off. Because I trusted LinkedIn, not [...]
Obviously LinkedIn is also in the business of selling the data about you, and also access to you.
LinkedIn just doesn't like this other company leeching off that data LinkedIn got about you, and then competing with LinkedIn in making money off that data (including access).
Selling data inside their walled garden in a way I am OK with in exchange for a free service.
Not a 3rd party selling my information to a scam farm in a foreign land that has no laws that will use all of that information to extract money from my parents.
But linkedin is doing so in accordance with the legal agreement you have with them, which I am able to exit at any time and instruct them to remove my data. I can't do this for every company that illegally (in many jurisdictions) hordes information about me.
You're currently on one of the very few sites with no delete/edit button for your own content (after a short initial period.) It's the only site I can think of that hoards my data like that. Which is why I only post anonymous throwaway content here.
I think trusting data you post publicly to only remain exactly where you publish it is naive at best. I think it's much more sensible to think that as soon as you put something public, it will exist somewhere forever, and it's foolish to believe otherwise.
I sure do! If LinkedIn can't market my resume to open roles then letting recruiters roll their own scrapers against it is the next best thing. I understand that LI owns my data, I just wish they were effective in using it!
I don't even trust LinkedIn, but it's not like I can sue them for offering antisocial terms, let alone force them to a negotiation table. It's just a shitty situation all around. At the very least they should pay me to use the site if they're making money off of it.
If everyone has access to your data it becomes even more worthless and you will definitely not get aid for it. At least now I can keep it somewhere and they can use it to fund engineers to keep the service up, lawyers to make sure your data stays safe, etc.
You are free to leave and delete your data, unlike if everyone has access to it then it is out there in perpetuity.
You definitely can't sue a data broker to pay you/stop using your data.
basically, linkedin is just pissed off they weren't getting a cut of the profits this small company made on linkedins (already public?) data.
The winners here are the law firms on both the plaintiff and defendant sides. Drag this through the court system for as long as possible. PR. PR. PR. Then settle out of court for an "undisclosed amount."
This is the mafia equivalent of "sending a message" in corporate land. Yawn.
They're owned by Microsoft and poorly managed. Hundreds of people get locked out daily and can no longer access or change their OWN data. I say, let the scrapers take them down. We need to stop the walled in gardens of data these companies DONT own - it's the user's data.
Complaint
https://storage.courtlistener.com/recap/gov.uscourts.cand.45...
Yeah, only Microsoft is allowed to indiscriminately scrape the web!
I somehow want both parties to lose.
LinkedIn is the only website on the internet I want scraped so I can view it without it sending a notification to every person whose profile I look at
You can turn on Private Browsing, even on a free account. It also prevents YOU from seeing who viewed you, though, unless you buy premium.
Can the company just claim it’s for AI training and it’s fair use?
It has started to backfire.
Claude also had to a pay almost 1.5b for illegally training / scrapping.
https://www.cnn.com/2025/09/05/business/anthropic-ai-settlem...
IIUC that was for illegally downloading ebooks and other media -- it had nothing to do with training per se. Scraping publicly accessible data is generally legal, although Microsoft/LinkedIn clearly think they have enough of a leg to stand on to at least litigate this.
anthropic was _not_ sued for including data scraped from public websites. they were sued for including data extracted from pirated books.
Why are they going after the small fish?
If they really want to put a dent into this, go after the biggest players scraping LinkedIn: PeopleDataLabs and Apollo.io (and no, taking down their company page does not count)
Victory against small fish => establish legal precedence
legal precedence => Surer victory in the future for similar lawsuits
Seems there is a scraping precedent already, set by Linkedin v HiQ
https://www.fbm.com/publications/what-recent-rulings-in-hiq-...
Reminds me of the Apple vs Pear law suit
https://www.entrepreneur.com/business-news/apple-sues-small-...
The dispute was settled because Pear agreed to slightly alter its logo, instead of continuing full litigation (maybe because of resources / dollars it would consume)
Only if the case goes to trial.
If they settle, or the case got dismissed -- no precedent is set.
Even the legal filing and motions can help shape a case since they get rulings and such back. If a judge rejects a motion, maybe they need to approach it a different way when they go after big fish.
Only way this is not beneficial is if software company settle or gets dismissed right away.
If that’s going to happen with a small fish then it was certainly going to happen against a big fish. Cheaper, faster, and easier to attack a smaller business first. There is literally no reason to go after a big dog unless they did something particularly egregious and/or distinct that you can anchor your argument with. Unless your goal is just to waste their time and that of their lawyers I guess, though I think we would all assume the goal is to win ultimately.
Against bigger fish.
And there's always a bigger fish.
Because they either have side deals with the big names, or they want to set precedent for going after them.
Not trying to be a conspiracy theorist here, but my bet is on having a deal with the big players, we allow you to scrape us (or we give you a pipe you can consume out of), and you pay us in monetary or non-monetary terms; like how many business exchanges work
I doubt they have side deals. They took action on some of them by removing their company page, but that is like a slap in the hand.
If you want to make a big deal about this, tell us you at least sent a letter to the big players too. Otherwise, dont put up such a huge show
They have a trademark ridealong whose chances improve against a less-recognized company.
Go after small fish that no one cares about first to normalize the activity, then move up to bigger and bigger targets until you become inevitable.
Or, go after the small fish who can’t afford to have a biglaw team on retainer, bulldoze them to get a legal precedent set, and then use the example to extract concessions from the bigger players.
A smaller company without a big legal team is probably more likely to settle than a big company. Settlements don't establish precedent.
So you get money on the way up until you find a company willing to battle in court and lose.
A bunch of GTM and Sales APIs recently stopped offering their LinkedIn APis. Seems like the lawsuits are working to scare them off.
Prediction: this will be a very much pay to play market
Examples?
This happened before in hiQ Labs v. LinkedIn.[0]
I've heard a lot of people cite this case as proof that scraping is legal, but it seems like the decision kept going back and forth in appeals, and I never understood what precedent it set, if any, around the legality of scraping.
[0] https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn
The company that put an Email proxy on people's phones to scrape all email going in and out has a complaint about scraping?
I haven't heard of it and I couldn't find the story by these keywords. Can you tell me more? I'm genuinely interested.
Whoa, really? That is diabolical. Can you provide more info?
Curious if Dex (YC 19) (getdex.com) is at risk — their LinkedIn integration requires a chrome extension to scrape data rather than LinkedIn APIs.
If I had the Infinity Gems but I could only use them once, I would strongly consider snapping LinkedIn out of existence.
please, go bigger and do all social media types
Oh dear, my office has been scraping LinkedIn forever. We use it to make visual networks of contacts in our industry, and relate that to whom we have working for the company. oops.
Well maybe I can get that company to backup my LinkedIn posts because it is utterly broken to download anything about my profile to make a backup.
There is an API option but endpoints from documentation just return 404. There is Data Privacy "download my data" I wanted really data like my posts, photos not crappy CSV having basic properties. In the end there is "View the rich media" but also I have to click one by one and there is no text for posts on the images - I can do that going one by one of my posts and copy pasting. It sucks despite "your data belongs to you" texts on the labels.
Back up your linkedin posts? What valuable information was ever contained in a linkedin post?
These are my posts I have personal attachment to what I wrote.
Most of what I wrote I have in my notes anyway — but still if they say it is my data and I can always download it, I really want to download it and not like that someone just puts up lies on their website like "data is yours you can always download it".
I'm old enough to remember when pretty much every single social media company had really nice APIs so third party clients could be built.
Oh man, a lot of the web really feels very enshittified these days.
Only a linkedin executive could consider user submitted personal information to be "their" data
They are responsible for it. If people are gaining access to that data in ways other than what the users were led to believe, it is LI's problem
Can't you gain access simply by making a free account?
I don’t get why LinkedIn should be gatekeeping this data that it doesn’t create. It’s bad for society.
They also make it difficult to destroy. Try deleting your post or comment history, and you can only do it slowly one by one, with only a few sketchy tools for making it faster that go against their terms of service.
Compared to HN, which doesn't allow for any comments to be deleted?
Have ChatGPT code up a script for you, that you can paste into developer tools. It's how I deleted all my content from there.
Other social media do it too. At best you can only delete your entire account.
I think most users don't want their data to be used by anyone and everyone. I sure don't. If one user needs access to their own data, they can always export it and take it where they please.
For most people the dangers of openness (see Cambridge Analytica), the lack of upside and the lack of security in small players mean that walled gardens are the best solution for the majority of people.
This lawsuit is exactly why people trust walled gardens to keep their data walled off. Because I trusted LinkedIn, not ProAPI and whatever malicious actors they sell to.
> This lawsuit is exactly why people trust walled gardens to keep their data walled off. Because I trusted LinkedIn, not [...]
Obviously LinkedIn is also in the business of selling the data about you, and also access to you.
LinkedIn just doesn't like this other company leeching off that data LinkedIn got about you, and then competing with LinkedIn in making money off that data (including access).
Selling data inside their walled garden in a way I am OK with in exchange for a free service.
Not a 3rd party selling my information to a scam farm in a foreign land that has no laws that will use all of that information to extract money from my parents.
But linkedin is doing so in accordance with the legal agreement you have with them, which I am able to exit at any time and instruct them to remove my data. I can't do this for every company that illegally (in many jurisdictions) hordes information about me.
You're currently on one of the very few sites with no delete/edit button for your own content (after a short initial period.) It's the only site I can think of that hoards my data like that. Which is why I only post anonymous throwaway content here.
I think trusting data you post publicly to only remain exactly where you publish it is naive at best. I think it's much more sensible to think that as soon as you put something public, it will exist somewhere forever, and it's foolish to believe otherwise.
I sure do! If LinkedIn can't market my resume to open roles then letting recruiters roll their own scrapers against it is the next best thing. I understand that LI owns my data, I just wish they were effective in using it!
(edit: "my" data, as in the data I post there.)
I guess that was my point, YOU are free to export your data and post it on the internet, but don't make everyone (me) do the same.
I don't even trust LinkedIn, but it's not like I can sue them for offering antisocial terms, let alone force them to a negotiation table. It's just a shitty situation all around. At the very least they should pay me to use the site if they're making money off of it.
If everyone has access to your data it becomes even more worthless and you will definitely not get aid for it. At least now I can keep it somewhere and they can use it to fund engineers to keep the service up, lawyers to make sure your data stays safe, etc.
You are free to leave and delete your data, unlike if everyone has access to it then it is out there in perpetuity.
You definitely can't sue a data broker to pay you/stop using your data.
So are they gonna go after pitchbook and crunchbase too or nah?
basically, linkedin is just pissed off they weren't getting a cut of the profits this small company made on linkedins (already public?) data.
The winners here are the law firms on both the plaintiff and defendant sides. Drag this through the court system for as long as possible. PR. PR. PR. Then settle out of court for an "undisclosed amount."
This is the mafia equivalent of "sending a message" in corporate land. Yawn.
They're owned by Microsoft and poorly managed. Hundreds of people get locked out daily and can no longer access or change their OWN data. I say, let the scrapers take them down. We need to stop the walled in gardens of data these companies DONT own - it's the user's data.
I hope linkedIn looses.