Community

Generative AI Systems Miss Vast Bodies of Human Knowledge, Study Finds

Published

on

[ad_1]

Generative AI models trained on internet data lack exposure to vast domains of human knowledge that remain undigitized or underrepresented online. English dominates Common Crawl with 44% of content. Hindi accounts for 0.2% of the data despite being spoken by 7.5% of the global population. Tamil represents 0.04% despite 86 million speakers worldwide. Approximately 97% of the world’s languages are classified as “low-resource” in computing.

A 2020 study found 88% of languages…

[ad_2]

Source link

Exit mobile version