Sumanth Doddapaneni

PhD Student • IIT MadrasAI4BharatMila - Quebec AI Institute
Looking for Internships in Summer 2023

Hello | నమస్కారం | नमस्ते

I am a first second year PhD Student at IIT Madras & AI4Bharat, where I'm advised by Mitesh M. Khapra, Pratyush Kumar and Anoop Kunchukuttan. I'm also a visiting researcher at Mila - Quebec AI Institute working with Rahul Aralikatte

My research interets are aligned towards Multilingual learning for creating Language Models, Machine Translation and Speech Recognition Models. One of the primary goals of my research is to develop data and models for under-resourced languages and make NLP techniques accessible to a much wider audience. Released IndicBERT, IndicTrans and IndicWav2Vec as part of this initiative.

Feel free to check out my resume and drop me an email if you want to chat with me!

news

Feb, 2023 Talk at Google Research India on Building Natural Language Understanding (NLU) capabilities for Indic languages. Thanks Nitish Gupta and Partha Talukdar for hosting me!
Feb, 2023 Our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages is accepted at ICASSP 2023.
Jan, 2023 I'll be attending Google Research Week 2023! Let's catch up if you are there!
Dec, 2022 Released Naamapadam. released. Paper is available here.
Dec, 2022 Released IndicXTREME and IndicBERT v2. released. Paper is available here.
Dec, 2022 Attending EMNLP 2022, Abu Dhabi 🇦🇪. Let's catch up if you are there!
Nov, 2022 I'll be attending ALPS 2023! Let's catch up if you are there!
Sept, 2022 Relased a pre-print of our paper Effectiveness of Mining Audio and Text Pairs from Public Data for Improving ASR Systems for Low-Resource Languages. Work led by Kaushal Bhogale
May, 2022 Presenting Samanantar at ACL 2022, Dublin 🇮🇪 (Thank You, Prof. Mitesh Khapra). In-person talk (25/05, session 7) & Poster (25/05, session 6). Come say Hi, and let's talk NMT
Feb, 2022 Presenting IndicWav2Vec at AAAI 2022.
Feb, 2022 I'll be attending the Google Research Week 2022! Feel free to get in touch if you are attending the same.

selected papers

  1. arXiv
    Naamapadam: A Large-Scale Named Entity Annotated Data for Indic Languages
    Arnav Mhaske, Harshit Kedia, Sumanth Doddapaneni, Mitesh M. Khapra, Pratyush Kumar, Rudra Murthy V, Anoop Kunchukuttan
  2. arXiv
    IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages
    Sumanth Doddapaneni, Rahul Aralikatte, Gowtham Ramesh, Shreya Goyal, Mitesh M. Khapra, Anoop Kunchukuttan, Pratyush Kumar
  3. TACL
    Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages
    Gowtham Ramesh*, Sumanth Doddapaneni*, Aravinth Bheemaraj, Mayank Jobanputra, Raghavan AK, Ajitesh Sharma, Sujit Sahoo, Harshita Diddee, Divyanshu Kakwani, Navneet Kumar, Aswin Pradeep, Srihari Nagaraj, Kumar Deepak, Vivek Raghavan, Anoop Kunchukuttan, Pratyush Kumar, Mitesh Shantadevi Khapra
  4. AAAI
    Towards Building ASR Systems For The Next Billion Users
    Tahir Javed, Sumanth Doddapaneni, Abhigyan Raman, Kaushal Santosh Bhogale, Gowtham Ramesh, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra