We write in advance of the 99th pre-session of the Committee on the Rights of the Child and its review of Brazil. This is an update to our March 2024 submission and focuses on our recent research on the scraping and misuse of Brazilian children's personal photos to build AI tools without their knowledge or consent.
Brazilian Children's Personal Photos Misused to Power AI Tools (articles 12, 16, and 34)
In June 2024, Human Rights Watch reported that it had uncovered the scraping and use of personal photos of Brazilian children to create powerful AI tools without the knowledge or consent of the children or their families.[1] These photos are scraped off the web into a large data set that companies then use to train their AI tools. In turn, others use these tools to create malicious deepfakes that put even more children at risk of exploitation and harm.
Analysis by Human Rights Watch found that LAION-5B, a data set used to train popular AI tools and built by scraping most of the internet, contains links to identifiable photos of Brazilian children. Some children's names are listed in the accompanying caption or the URL where the image is stored. In many cases, their identities are easily traceable, including information on when and where the child was at the time their photo was taken.
One such photo features a 2-year-old girl, her lips parted in wonder as she touches the tiny fingers of her newborn sister. The caption and information embedded in the photo reveals not only both children's names but also the name and precise location of the hospital in Santa Catarina where the baby was born nine years ago on a winter afternoon.
Human Rights Watch found 170 photos of children from at least 10 states: Alagoas, Bahia, Ceará, Mato Grosso do Sul, Minas Gerais, Paraná, Rio de Janeiro, Rio Grande do Sul, Santa Catarina, and São Paulo. This is likely to be a significant undercount of the total amount of children's personal data that exists in LAION-5B, as Human Rights Watch reviewed less than 0.0001 percent of the 5.85 billion images and captions contained in the data set.
The photos Human Rights Watch reviewed span the entirety of childhood. They capture intimate moments of babies being born into the gloved hands of doctors, young children blowing out candles on their birthday cake or dancing in their underwear at home, students giving a presentation at school, and teenagers posing for photos at their high school's carnival.
Many of these photos were originally seen by few people and previously had a measure of privacy. They do not appear to be possible to find through an online search. Some of these photos were posted by children, their parents, or their family on personal blogs and photo- and video-sharing sites. Some were uploaded years or even a decade before LAION-5B was created.
Once their data is swept up and fed into AI systems, these children face further threats to their privacy due to flaws in the technology. AI models, including those trained on LAION-5B, are notorious for leaking private information; they can reproduce identical copies of the material they were trained on, including medical records and photos of real people.[2] Guardrails set by some companies to prevent the leakage of sensitive data have been repeatedly broken.[3]
These privacy risks pave the way for further harm. Training on photos of real children enables AI models to create convincing clones of any child, based on a handful of photos or even a single image.[4] Malicious actors have used LAION-trained AI tools to generate explicit imagery of children using innocuous photos, as well as explicit imagery of child survivors whose images of sexual abuse were scraped into LAION-5B.[5]
Likewise, the presence of Brazilian children in LAION-5B contributes to the ability of AI models trained on this data set to produce realistic imagery of Brazilian children. This substantially amplifies the existing risk children face that someone will steal their likeness from photos or videos of themselves posted online and use AI to manipulate them into saying or doing things that they never said nor did.
At least 85 girls from Alagoas, Minas Gerais, Pernambuco, Rio de Janeiro, Rio Grande do Sul, and São Paulo have reported harassment by their classmates, who used AI tools to create sexually explicit deepfakes of the girls based on photos taken from their social media profiles and then circulated the faked images online.
Fabricated media have always existed, but required time, resources, and expertise to create, and were largely unrealistic. Current AI tools create lifelike outputs in seconds, are often free, and are easy to use, risking the proliferation of nonconsensual deepfakes that could recirculate online forever and inflict lasting harm.
LAION, the German nonprofit organization that manages LAION-5B, confirmed on June 1 that the data set contained the children's personal photos found by Human Rights Watch, and pledged to remove them, saying they would send Human Rights Watch confirmation of the removal once it was completed. As of August 16, it had not provided confirmation that it has removed the children's data from its data set. LAION also disputed that AI models trained on LAION-5B could reproduce personal data verbatim. It said: "We urge the HRW to reach out to the individuals or their guardians to encourage removing the content from public domains, which will help prevent its recirculation."
Lawmakers in Brazil have proposed banning the nonconsensual use of AI to generate sexually explicit images of people, including children.[6] These efforts are urgent and important, but they only tackle one symptom of the deeper problem that children's personal data remain largely unprotected from misuse. As written, Brazil's data protection law-the Lei Geral de Proteção de Dados Pessoais or the General Personal Data Protection Law-does not provide sufficient protections for children.
The government should bolster the data protection law by adopting additional, comprehensive safeguards for children's data privacy.
On July 2, the National Data Protection Authority issued a preliminary ban on Meta's use of its Brazil-based users' personal data to train its AI systems.[7] The government's unprecedented decision follows Human Rights Watch's research as described above, and included two arguments that reflected Human Rights Watch's recommendations: the first is the importance of protecting children's data privacy, given the risk of harm and exploitation that results from their data being scraped and used by AI systems. The second centers on purpose limitation, and that people's expectations of privacy when they share their personal data online should be respected.
Human Rights Watch recommends that the Committee:
- Congratulate Brazil on its preliminary ban on Meta and ask what steps it has taken to evaluate Meta's compliance with the decision.
- Ask Brazil whether it will facilitate remedy for children whose privacy was violated through the nonconsensual scraping and misuse of their personal photos.
- Ask Brazil whether it will take measures to prevent the future nonconsensual scraping and misuse of children's personal data.
Human Rights Watch encourages the Committee call on the government of Brazil to:
- Adopt and enforce laws to protect children's rights online, including their data privacy.
- Incorporate data privacy protections for children in its forthcoming national policy to protect the rights of children and adolescents in the digital environment, the drafting process of which was originally scheduled for completion in July 2024 and appears to have been postponed to October.[8]
- Incorporate data privacy protections for children in its proposed AI regulations and national AI plan[9], such that children's rights are respected, protected, and promoted throughout the development and use of AI. The government should take special care to protect children's privacy with respect to AI, as the nature of the technology's development and use does not permit children and their guardians to meaningfully consent to how children's data privacy is handled. These protections should:
- Prohibit the scraping of children's personal data into AI systems, given the privacy risks involved and the potential for new forms of misuse as the technology evolves.
- Prohibit the nonconsensual digital replication or manipulation of children's likenesses.
Provide those who experience harm through the development and use of AI with mechanisms to seek meaningful justice and remedy.
[1] "Brazil: Children's Personal Photos Misused to Power AI Tools," Human Rights Watch news release, June 10, 2024, https://www.hrw.org/news/2024/06/10/brazil-childrens-personal-photos-misused-power-ai-tools.
[2] Carlini et al., "Extracting Training Data from Diffusion Models," January 30, 2023, https://doi.org/10.48550/arXiv.2301.13188 (accessed July 9, 2024); Benj Edwards, "Artist finds private medical record photos in popular AI training data set," Ars Technica, September 21, 2022, https://arstechnica.com/information-technology/2022/09/artist-finds-private-medical-record-photos-in-popular-ai-training-data-set/ (accessed July 9, 2024).
[3] Carlini et al., "Extracting Training Data from Diffusion Models"; Nasr et al., "Extracting Training Data from ChatGPT," November 28, 2023, https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html (accessed July 9, 2024); Mehul Srivastava and Cristina Criddle, "Nvidia's AI software tricked into leaking data," Financial Times, June 9, 2023, https://www.ft.com/content/5aceb7a6-9d5a-4f1f-af3d-1ef0129b0934 (accessed July 9, 2024) ; Matt Burgess, "OpenAI's Custom Chatbots Are Leaking Their Secrets," WIRED, November 29, 2023, https://www.wired.com/story/openai-custom-chatbots-gpts-prompt-injection-attacks/ (accessed July 9, 2024).
[4] Benj Edwards, "AI image generation tech can now create life-wrecking deepfakes with ease," Ars Technica, December 9, 2022, https://arstechnica.com/information-technology/2022/12/thanks-to-ai-its-probably-time-to-take-your-photos-off-the-internet/ (accessed July 9, 2024); Ibid., "Microsoft's VASA-1 can deepfake a person with one photo and one audio track," Ars Technica, April 19, 2024, https://arstechnica.com/information-technology/2024/04/microsofts-vasa-1-can-deepfake-a-person-with-one-photo-and-one-audio-track/ (accessed July 9, 2024).
[5] Emanuel Maiberg, "a16z Funded AI Platform Generated Images That "Could Be Categorized as Child Pornography," Leaked Documents Show," 404 Media, December 5, 2023, https://www.404media.co/a16z-funded-ai-platform-generated-images-that-could-be-categorized-as-child-pornography-leaked-documents-show/ (accessed July 9, 2024); David Thiel, "Identifying and Eliminating CSAM in Generative ML Training Data and Models," Stanford Internet Observatory, December 23, 2023, https://stacks.stanford.edu/file/druid:kh752sm9123/ml_training_data_csam_report-2023-12-23.pdf (accessed July 9, 2024).
[6] Projeto de Lei no PL 5342/2023, available at https://www.camara.leg.br/proposicoesWeb/fichadetramitacao?idProposicao=2401172 (accessed August 19, 2024), and Projeto de Lei no PL 5394/2023, available at https://www.camara.leg.br/proposicoesWeb/fichadetramitacao?idProposicao=2402162 (accessed August 19, 2024).
[7] Hye Jung Han, "Brazil Prevents Meta from Using People to Power Its AI," commentary, Human Rights Watch dispatch, July 3, 2024, https://www.hrw.org/news/2024/07/03/brazil-prevents-meta-using-people-power-its-ai.
[8] Guilherme Seto et al., "Silvio Almeida and Moraes discuss child protection on the internet," ("Silvio Almeida e Moraes discutem proteção de crianças na internet"), Folha de São Paulo, June 20, 2024, https://www1.folha.uol.com.br/colunas/painel/2024/06/silvio-almeida-e-moraes-discutem-protecao-de-criancas-na-internet.shtml (accessed July 26. 2024); Conselho Nacional dos Direitos da Criança e do Adolescente, "Resolution No. 245, of April 5, 2024, Provides for the rights of children and adolescentes in the digital environment," ("Resolução nº 245, de 5 de abril de 2024, Dispõe sobre os direitos das crianças e adolescentes em ambiente digital"), April 16, 2024, https://www.gov.br/participamaisbrasil/blob/baixar/48630 (accessed August 7, 2024).
[9] See Ministry of Science, Technology, and Innovation, "Plano brasileiro de IA terá supercomputador e investimento de R$ 23 bilhões em quatro anos," July 30, 2024, updated August 12, 2024, https://www.gov.br/mcti/pt-br/acompanhe-o-mcti/noticias/2024/07/plano-brasileiro-de-ia-tera-supercomputador-e-investimento-de-r-23-bilhoes-em-quatro-anos (accessed August 19, 2024). The proposed plan does not refer to the protection of human and children's rights in the government's planned development and use of AI. Human Rights Watch notes that Brazil served as a key sponsor for the United Nations General Assembly Resolution A/78/L.49, which "emphasizes that human rights and fundamental freedoms must be respected, protected and promoted throughout the life cycle of artificial intelligence systems…." See UN General Assembly, Resolution A/78/L.49 (2024), available at https://www.undocs.org/Home/Mobile?FinalSymbol=A%2F78%2FL.49&Language=E&DeviceType=Desktop&LangRequested=False (accessed July 9, 2024), para 5.