Computer science researcher听Antonis Anastasopoulos听uses听his love for computer science, language, and linguistics听to听improve equality in language technologies.听
When people ask Siri, Alexa, or Google听Assistant听a question, they expect the programs听to听understand them, but that is not always the case, he says.听
A person鈥檚听language, accent, dialect, and even gender can have an impact, preventing the system from interpreting them correctly, says Anastasopoulos, an assistant professor in the 听and an expert听in natural language processing, which is听how computers attempt to process and understand human languages.
听鈥淭he systems don鈥檛 work equally well for everyone,鈥 says听Anastasopoulos,听who听speaks Greek (his native language), English, German, Swedish, Italian, and some Spanish.
He听is听one of several co-principal investigators who received a new National Science Foundation-Amazon grant for their research, 鈥淨uantifying and Mitigating Disparities in Language Technologies.鈥
In the fall,听Anastasopoulos听also won a鈥痜or a project on how accent听and听dialect听impact听language technologies.
For the NSF grant, he and experts from Carnegie Mellon 麻豆国产 and the 麻豆国产 of Washington are听studying听areas听where there is bias听in language technologies and听measuring the discrepancies. Then听they听will attempt to mitigate the听inequalities.
鈥淲e want to measure the extent to which the diversity of language affects the utility that speakers get from language technologies,鈥澨鼳nastasopoulos听says.听鈥淲e will focus on automatic translation and speech recognition since they are perhaps the most commonly used language technologies throughout the world.鈥
His听research will apply to all languages.听It鈥檚听important to look deeply into languages for differences because languages are flexible and diverse, he says. 鈥淭here are many regional variations that are different from the standard.鈥
He also recently received a $350,000听grant from the National Endowment for the Humanities (NEH) to build optical character recognition tools to convert scanned images of text to a machine-readable format听for endangered languages.
鈥淲e are听working on training machine-learning models to听process听images and texts in听the听books听and documents听of indigenous languages听from听central and South America听so that听these听works听can be听made听accessible to everyone,鈥 he says. 鈥淲e are building technologies to study those languages computationally.鈥
Anastasopoulos is听also听part of听a prestigious听group of machine-translation researchers, including experts听from Facebook, Google, Amazon, and Microsoft,听who are听creating听automatic tools that translate COVID-19-related content听for听communities听where people听don鈥檛听speak the languages most often used by large health organizations, including听the World Health Organization.
听鈥淲e are working closely with Translators without Borders. So听far, we have produced terminologies for听more than听200 languages and a large dataset for鈥35 languages, some of them extremely听under-served by听the听current solution.鈥
听