Sumanth Doddapaneni

PhD Student • IIT MadrasAI4Bharat
619 NAC2, IIT Madras

Hello | నమస్కారం | नमस्ते

I am a first, second third year PhD Student at IIT Madras & AI4Bharat, where I'm advised by Mitesh M. Khapra and Anoop Kunchukuttan. My PhD research is supported by Google PhD Fellowship 2023.

My research interests are aligned towards Multilingual Learning for building Language Models, Machine Translation and Speech Recognition Models. One of the primary goals of my research is to develop models and data for under-resourced languages and make NLP technologies accessible to a much wider audience. Released IndicBERT, IndicTrans and IndicWav2Vec as part of this initiative.

Previously, I interned at Google Research in Bangalore, working on improving multilingual generation, working with from Nitish Gupta and Partha Talukdar. I also spent a summer at Google Research in Mountain View, focusing on language model personalization, working with Krishna Sayana. I collaborated with Rahul Aralikatte at Mila - Quebec AI Institute on multilingual summarization.

If you wanna chat about research/academia/whatever, feel free to reach out to sumanthd@cse.iitm.ac.in

news

Nov, 2024 🏆 Delighted to share that our work FBI has received the Outstanding Paper Award at EMNLP 2024!
Nov, 2024 Talk at Google Research, Mountain View on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Krishna Sayana for hosting me!
Nov, 2024 Talk at Language Technologies Institute (LTI) @ Carnegie Mellon University (CMU) on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Simran Khanuja for hosting me!
Nov, 2024 Attending EMNLP 2024, Miami, USA 🇺🇸 to present FBI work! Let's catch up if you are there!
Oct, 2024 Talk at IT University of Copenhagen on "Rethinking Evaluator LLMs With a Cross-Lingual Twist". Thanks Ratish Puduppully for hosting me!
Oct, 2024 Our pre-print CIA: Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs is out on arxiv!
Sep, 2024 FBI paper has been accepted at EMNLP 2024!
Sep, 2024 After almost 3 years of reviewing our survey paper on multilingual language models(pre-gpt era) paper has been accepted to ACM Computing Surveys!
Aug, 2024 🏆 Delighted to share that our work IndicLLMSuite has received the Outstanding Paper Award at ACL 2024!
Aug, 2024 Attending ACL 2024, Bangkok, Thailand 🇹🇭! Let's catch up if you are there!
June, 2024 Our pre-print FBI: Finding Blindspots in LLM Evaluations with Interpretable Checklists is out on arxiv!
May, 2024 IndicLLMSuite has been accepted at ACL 2024!
Feb, 2024 I'll be at the Google Research Week, between Feb 1-3, 2024. Let's catch up if you are there!
Dec, 2023 Attending EMNLP 2023, Singapore 🇸🇬! Let's catch up if you are there!
Nov, 2023 My research is now funded by Google PhD Fellowship. Thank You, Google!
Nov, 2023 Will start as a Research Intern at Google Research, India 🇮🇳! Will be in Bangalore till March'24. Let's catch up if you are here!
Oct, 2023 Will be traveling along the East Coast during the last 2 weeks of October, Boston (Oct 22-24), New York (Oct 24-28), Pittsburg (Oct 28-31) and back in the Bay Area till Nov 4. Come say Hi, and let's walk around the streets, eat good food and maybe talk NLP!
July, 2023 Will be attending ACL 2023 in Toronto, Canada 🇨🇦! I will be presenting IndicXTREME, Naamapadam, and Vārta. Let's catch up if you are here!
June, 2023 Started as a Student Researcher at Google Research, Mountain View 🇺🇸! Will be in the Bay Area till November. Let's catch up if you are here!
May, 2023 Released IndicTrans2, this is the first model to support all 22 Scheduled Indian languages. More details in the Paper. Kudos to entire AI4Bharat Team for pulling off this herculean effort.
May, 2023 Released a pre-print "A Comprehensive Analysis of Adapter Efficiency". Paper available here. Kudos to Nandini Mundra for driving this work!
May, 2023 Three papers accepted at ACL 2023. Pre-prints - IndicXTREME, Vārta, Naamapadam
Apr, 2023 Our paper A Survey of Adversarial Defences and Robustness in NLP is accepted at ACM Computing Surveys.
Feb, 2023 Talk at Google Research India on Building Natural Language Understanding (NLU) capabilities for Indic languages. Thanks Nitish Gupta and Partha Talukdar for hosting me!
Feb, 2023 Our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages is accepted at ICASSP 2023.
Jan, 2023 I'll be attending Google Research Week 2023! Let's catch up if you are there!
Dec, 2022 Released Naamapadam. released. Paper is available here.
Dec, 2022 Released IndicXTREME and IndicBERT v2. released. Paper is available here.
Dec, 2022 Attending EMNLP 2022, Abu Dhabi 🇦🇪! Let's catch up if you are there!
Nov, 2022 I'll be attending ALPS 2023! Let's catch up if you are there!
Sept, 2022 Relased a pre-print of our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. Work led by Kaushal Bhogale
May, 2022 Presenting Samanantar at ACL 2022, Dublin 🇮🇪 (Thank You, Prof. Mitesh Khapra). In-person talk (25/05, session 7) & Poster (25/05, session 6). Come say Hi, and let's talk NMT
Feb, 2022 Presenting IndicWav2Vec at AAAI 2022.
Feb, 2022 I'll be attending the Google Research Week 2022! Feel free to get in touch if you are attending the same.

selected papers

  1. ArXiv
    Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
    Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Dilip Venkatesh, Raj Dabre, Anoop Kunchukuttan, Mitesh M. Khapra
  2. EMNLP'24
    paper award
    Finding Blindspots in LLM Evaluations with Interpretable Checklists
    Sumanth Doddapaneni, Mohammed Safi Ur Rahman Khan, Sshubam Verma, Mitesh M. Khapra
  3. ACL'24
    paper award
    IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages
    Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra
  4. TMLR
    IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages
    AI4Bharat, Jay Gala, Pranjal A. Chitale, Raghavan AK, Sumanth Doddapaneni, Varun Gumma, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan
  5. ACL'23
    Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages
    Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar
  6. TACL
    Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages
    Gowtham Ramesh*, Sumanth Doddapaneni*, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra
  7. AAAI'22
    Towards Building ASR Systems For The Next Billion Users
    Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

flags

Conferences and internships took me here 🇮🇪 🇦🇪 🇺🇸 🇨🇦 🇸🇬 🇹🇭 (so far)!