The scraped information of two.6 million DuoLingo customers was leaked on a hacking discussion board, permitting menace actors to conduct focused phishing assaults utilizing the uncovered info.
Duolingo is without doubt one of the largest language studying websites on the earth, with over 74 million month-to-month customers worldwide.
In January 2023, somebody was promoting the scraped information of two.6 million DuoLingo customers on the now-shutdown Breached hacking discussion board for $1,500.
This information features a combination of public login and actual names, and personal info, together with e mail addresses and inner info associated to the DuoLingo service.
Whereas the actual identify and login identify are publicly obtainable as a part of a consumer’s Duolingo profile, the e-mail addresses are extra regarding as they permit this public information for use in assaults.

Supply: Falcon Feeds
When the information was on the market, DuoLingo confirmed to TheRecord that it was scraped from public profile info and that they have been investigating whether or not additional precautions must be taken.
Nonetheless, Duolingo didn’t tackle the truth that e mail addresses have been additionally listed within the information, which isn’t public info.
As first noticed by VX-Underground, the scraped 2.6 million consumer dataset was launched yesterday on a brand new model of the Breached hacking discussion board for 8 web site credit, price solely $2.13.
“At this time I’ve uploaded the Duolingo Scrape so that you can obtain, thanks for studying and revel in!,” reads a put up on the hacking discussion board.

Supply: BleepingComputer
This information was scraped utilizing an uncovered software programming interface (API) that has been shared overtly since no less than March 2023, with researchers tweeting and publicly documenting the right way to use the API.
The API permits anybody to submit a username and retrieve JSON output containing the consumer’s public profile info. Nonetheless, it’s also attainable to feed an e mail tackle into the API and ensure whether it is related to a sound DuoLingo account.
BleepingComputer has confirmed that this API remains to be overtly obtainable to anybody on the internet, even after its abuse was reported to DuoLingo in January.
This API allowed the scraper to feed hundreds of thousands of e mail addresses, doubtless uncovered in earlier information breaches, into the API and ensure in the event that they belonged to DuoLingo accounts. These e mail addresses have been then used to create the dataset containing public and personal info.
One other menace actor shared their very own API scrape, mentioning that menace actors wishing to make use of the information in phishing assaults ought to take note of particular fields that point out a DuoLingo consumer has extra permission than a daily consumer and are thus extra helpful targets.
BleepingComputer has contacted DuoLingo with questions on why the API remains to be publicly obtainable however didn’t obtain a reply on the time of this publication.
Scraped information usually dismissed
Corporations are likely to dismiss scraped information as not a problem as many of the information is already public, even when it isn’t essentially straightforward to compile.
Nonetheless, when public information is combined with non-public information, comparable to telephone numbers and e mail addresses, it tends to make the uncovered info extra dangerous and doubtlessly violate information safety legal guidelines.
For instance, in 2021, Fb suffered an enormous leak after an “Add Good friend” API bug was abused to hyperlink telephone numbers to Fb accounts for 533 million customers. The Irish information safety fee (DPC) later fined Fb €265 million ($275.5 million) for this leak of scraped information.
Extra just lately, a Twitter API bug was used to scrape the general public information and e mail addresses of hundreds of thousands of customers, resulting in an investigation by the DPC.