Sumanth Doddapaneni
ML Researcher • Sarvam AI • AI4Bharat • IIT Madras

Hello | నమస్కారం | नमस्ते
I'm Sumanth Doddapaneni, a ML Researcher at Sarvam AI and a fourth-year Ph.D. student (currently on leave) in the Department of Computer Science at IIT Madras. I am advised by Mitesh M. Khapra and Anoop Kunchukuttan . My research focuses on multilingual language modeling, machine translation, and automatic speech recognition, with a strong emphasis on low-resource Indian languages. I’m fortunate to be a Google PhD Fellow (2023) and have received Outstanding Paper Awards at EMNLP 2024 and ACL 2024 for my work on auto evaluation and multilingual dataset creation.
Previously, I interned at Google Research in Bangalore, working on improving multilingual generation, working with from Nitish Gupta and Partha Talukdar. I also spent a summer at Google Research in Mountain View, focusing on language model personalization, working with Krishna Sayana. I collaborated with Rahul Aralikatte at Mila - Quebec AI Institute on multilingual summarization.
Some of my best contributions include: CIA, FBI, IndicTrans: v3, v2, and v1, IndicBERT and IndicWav2Vec
If you wanna chat about research/academia/whatever, feel free to reach out to sumanth@sarvam.ai
news
May, 2025 | CIA: Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs paper has been accepted at ACL 2025! |
---|---|
Apr, 2025 | Joined Sarvam AI as a ML Researcher. Building the best models for India and beyond! 🚀 |
Mar, 2025 | Happy to release the beta version of IndicTrans3. Try it out and let us know your feedback! 🚀 |
Mar, 2025 | Attending Advanced Language Processing School (ALPS) 2025 at the Centre CNRS Paul Langevin, Aussois, France! 🇫🇷 |
Jan, 2025 | Talk at Microsoft on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Gagan Madan for hosting me! |
Jan, 2025 | Talk at Google DeepMind, Bangalore on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Harman Singh for hosting me! |
Jan, 2025 | I'll be at the Google DeepMind Research Symposium on Jan 27 & 28, 2025. Let's catch up if you are there! |
Nov, 2024 | 🏆 Delighted to share that our work FBI has received the Outstanding Paper Award at EMNLP 2024! |
Nov, 2024 | Talk at Google Research, Mountain View on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Krishna Sayana for hosting me! |
Nov, 2024 | Talk at Language Technologies Institute (LTI) @ Carnegie Mellon University (CMU) on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Simran Khanuja for hosting me! |
Nov, 2024 | Attending EMNLP 2024, Miami, USA 🇺🇸 to present FBI work! Let's catch up if you are there! |
Oct, 2024 | Talk at IT University of Copenhagen on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Ratish Puduppully for hosting me! |
Oct, 2024 | Our pre-print CIA: Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs is out on arxiv! |
Sep, 2024 | FBI paper has been accepted at EMNLP 2024! |
Sep, 2024 | After almost 3 years of reviewing our survey paper on multilingual language models(pre-gpt era) paper has been accepted to ACM Computing Surveys! |
Aug, 2024 | 🏆 Delighted to share that our work IndicLLMSuite has received the Outstanding Paper Award at ACL 2024! |
Aug, 2024 | Attending ACL 2024, Bangkok, Thailand 🇹🇭! Let's catch up if you are there! |
June, 2024 | Our pre-print FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists is out on arxiv! |
May, 2024 | IndicLLMSuite has been accepted at ACL 2024! |
Feb, 2024 | I'll be at the Google Research Week, between Feb 1-3, 2024. Let's catch up if you are there! |
Dec, 2023 | Attending EMNLP 2023, Singapore 🇸🇬! Let's catch up if you are there! |
Nov, 2023 | My research is now funded by Google PhD Fellowship. Thank You, Google! |
Nov, 2023 | Will start as a Research Intern at Google Research, India 🇮🇳! Will be in Bangalore till March'24. Let's catch up if you are here! |
Oct, 2023 | Will be traveling along the East Coast during the last 2 weeks of October, Boston (Oct 22-24), New York (Oct 24-28), Pittsburg (Oct 28-31) and back in the Bay Area till Nov 4. Come say Hi, and let's walk around the streets, eat good food and maybe talk NLP! |
July, 2023 | Will be attending ACL 2023 in Toronto, Canada 🇨🇦! I will be presenting IndicXTREME, Naamapadam, and Vārta. Let's catch up if you are here! |
June, 2023 | Started as a Student Researcher at Google Research, Mountain View 🇺🇸! Will be in the Bay Area till November. Let's catch up if you are here! |
May, 2023 | Released IndicTrans2, this is the first model to support all 22 Scheduled Indian languages. More details in the Paper. Kudos to entire AI4Bharat Team for pulling off this herculean effort. |
May, 2023 | Released a pre-print "A Comprehensive Analysis of Adapter Efficiency". Paper available here. Kudos to Nandini Mundra for driving this work! |
May, 2023 | Three papers accepted at ACL 2023. Pre-prints - IndicXTREME, Vārta, Naamapadam |
Apr, 2023 | Our paper A Survey of Adversarial Defences and Robustness in NLP is accepted at ACM Computing Surveys. |
Feb, 2023 | Talk at Google Research India on Building Natural Language Understanding (NLU) capabilities for Indic languages. Thanks Nitish Gupta and Partha Talukdar for hosting me! |
Feb, 2023 | Our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages is accepted at ICASSP 2023. |
Jan, 2023 | I'll be attending Google Research Week 2023! Let's catch up if you are there! |
Dec, 2022 | Released Naamapadam. released. Paper is available here. |
Dec, 2022 | Released IndicXTREME and IndicBERT v2. released. Paper is available here. |
Dec, 2022 | Attending EMNLP 2022, Abu Dhabi 🇦🇪! Let's catch up if you are there! |
Nov, 2022 | I'll be attending ALPS 2023! Let's catch up if you are there! |
Sept, 2022 | Relased a pre-print of our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. Work led by Kaushal Bhogale |
May, 2022 | Presenting Samanantar at ACL 2022, Dublin 🇮🇪 (Thank You, Prof. Mitesh Khapra). In-person talk (25/05, session 7) & Poster (25/05, session 6). Come say Hi, and let's talk NMT |
Feb, 2022 | Presenting IndicWav2Vec at AAAI 2022. |
Feb, 2022 | I'll be attending the Google Research Week 2022! Feel free to get in touch if you are attending the same. |
selected papers
-
ACL'25Cross-Lingual Auto Evaluation for Assessing Multilingual LLMsEMNLP'24Finding Blindspots in LLM Evaluations with Interpretable ChecklistsACL'24IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian LanguagesTMLRIndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian LanguagesACL'23Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic LanguagesTACLSamanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic LanguagesAAAI'22Towards Building ASR Systems For The Next Billion Users
flags
Conferences and internships took me here 🇮🇪 🇦🇪 🇺🇸 🇨🇦 🇸🇬 🇹🇭 🇫🇷 (so far)!