Sumanth Doddapaneni
PhD Student • IIT Madras • AI4Bharat • Google Research

Hello | నమస్కారం | नमस्ते
I am a first second year PhD Student at IIT Madras & AI4Bharat, where I'm advised by Mitesh M. Khapra, Anoop Kunchukuttan and Pratyush Kumar.
My research interests are aligned towards Multilingual Learning for building Language Models, Machine Translation and Speech Recognition Models. One of the primary goals of my research is to develop models and data for under-resourced languages and make NLP technologies accessible to a much wider audience. Released IndicBERT, IndicTrans and IndicWav2Vec as part of this initiative. I also collaborated with Rahul Aralikatte at Mila - Quebec AI Institute, working on multilingual summarization.
Feel free to check out my resume and drop me an email if you want to chat with me!
news
July, 2023 | Will be attending ACL 2023 in Toronto, Canada 🇨🇦! I will be presenting IndicXTREME, Naamapadam, and Vārta. Let's catch up if you are here! |
---|---|
June, 2023 | Started as a Student Researcher at Google Research, Mountain View 🇺🇸! Will be in the Bay Area till November. Let's catch up if you are here! |
May, 2023 | Released IndicTrans2, this is the first model to support all 22 Scheduled Indian languages. More details in the Paper. Kudos to entire AI4Bharat Team for pulling off this herculean effort. |
May, 2023 | Released a pre-print "A Comprehensive Analysis of Adapter Efficiency". Paper available here. Kudos to Nandini Mundra for driving this work! |
May, 2023 | Three papers accepted at ACL 2023. Pre-prints - IndicXTREME, Vārta, Naamapadam |
Apr, 2023 | Our paper A Survey of Adversarial Defences and Robustness in NLP is accepted at ACM Computing Surveys. |
Feb, 2023 | Talk at Google Research India on Building Natural Language Understanding (NLU) capabilities for Indic languages. Thanks Nitish Gupta and Partha Talukdar for hosting me! |
Feb, 2023 | Our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages is accepted at ICASSP 2023. |
Jan, 2023 | I'll be attending Google Research Week 2023! Let's catch up if you are there! |
Dec, 2022 | Released Naamapadam. released. Paper is available here. |
Dec, 2022 | Released IndicXTREME and IndicBERT v2. released. Paper is available here. |
Dec, 2022 | Attending EMNLP 2022, Abu Dhabi 🇦🇪! Let's catch up if you are there! |
Nov, 2022 | I'll be attending ALPS 2023! Let's catch up if you are there! |
Sept, 2022 | Relased a pre-print of our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. Work led by Kaushal Bhogale |
May, 2022 | Presenting Samanantar at ACL 2022, Dublin 🇮🇪 (Thank You, Prof. Mitesh Khapra). In-person talk (25/05, session 7) & Poster (25/05, session 6). Come say Hi, and let's talk NMT |
Feb, 2022 | Presenting IndicWav2Vec at AAAI 2022. |
Feb, 2022 | I'll be attending the Google Research Week 2022! Feel free to get in touch if you are attending the same. |
selected papers
-
ArXiv 2023IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian LanguagesACL 2023Vārta: A Large-Scale Headline-Generation Dataset for Indic LanguagesACL 2023Naamapadam: A Large-Scale Named Entity Annotated Data for Indic LanguagesACL 2023Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic LanguagesTACLSamanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic LanguagesAAAITowards Building ASR Systems For The Next Billion Users