Interspeech 2021 Conference Sessions, Speakers and Papers

The UX for the Interspeech 2021 virtual conference platform could be more straightforward.

Here’s a quick list of sessions speakers and papers planned for the 9/30/21.

Microsoft

via: MSFTResearch site, see also @MSFTResearch

Tuesday, August 31, 2021

13:30 15:30

Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks  Soham Deshmukh, Bhiksha Raj, Rita Singh

13:30 15:30

Explaining Deep Learning Models for Speech Enhancement  Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr

13:30 15:30

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS  Yan Deng, Rui Zhao, Zhong Meng, Xie Chen, Bing Liu, Jinyu Li, Yifan Gong, Lei He

19:00 21:00

Data Augmentation for Spoken Language Understanding via Pretrained Language Models  Baolin Peng, Chenguang Zhu, Michael Zeng, Jianfeng Gao

19:00 21:00

Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need  Yan Huang, Guoli Ye, Jinyu Li, Yifan Gong

19:00 21:00

One-Shot Voice Conversion with Speaker-Agnostic StarGAN  Sefik Emre Eskimez, Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr

Wednesday, September 1, 2021

11:00 13:00

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems  Vikas Joshi, Amit Das, Eric Sun, Rupesh Mehta, Jinyu Li, Yifan Gong

11:00 13:00

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification  Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

16:00 18:00

A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems  Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li

16:00 18:00

Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing  Babak Naderi, Ross Cutler

19:00 21:00

MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages  Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

19:00 21:00

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition  Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

19:00 21:00

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement  Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

19:00 21:00Single-Channel Speech Enhancement Using Learnable Loss Mixup  Oscar Chang, Dung N. Tran, Kazuhito Koishida

19:00 21:00

INTERSPEECH 2021 Deep Noise Suppression Challenge  Chandan K A Reddy, Hari Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

Thursday, September 2, 2021

11:00 13:00

Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder  Young-Kyung Kim, Rimita Lahiri, Md Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan

11:00 13:00Source Separation I

Related Publications

16:00 18:00Multi- and cross-lingual ASR, other topics in ASR

Related Publications

16:00 18:00

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker  Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

16:00 18:00

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration  Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

16:00 18:00

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario  Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

Friday, September 3, 2021

11:00 13:00

Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models  Amber Afshan, Kshitiz Kumar, Jian Wu

16:00 18:00

End-to-End Speaker-Attributed ASR with Transformer  Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

16:00 18:00

Speech Synthesis: Speaking Style and Emotion

Related Publications

16:00 18:00

INTERSPEECH 2021 Acoustic Echo Cancellation Challenge

Ross Cutler, Ando Saabas, Tanel Panarmaa, Markus Loide, Sten Sootla, Marju Purin, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

Monday, September 6, 2021

Workshop 11:00 Workshop on Machine Learning in Speech and Language Processing 2021

Speaker: Chengyi Wang (Intern)
Organizing Committee: Yao Qian
Scientific Committee: Liang Lu

NVIDIA

via NVIDIA at INTERSPEECH 2021 site,

Tuesday, August 31, 2021

07:00 – 09:00 p.m. CET

Scene-Agnostic Multi-Microphone Speech Dereverberation
Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot

Wednesday, September 1, 2021

11:00 a.m. – 01:00 p.m. CET

SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

07:00 – 09:00 p.m. CET

Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

Thursday, September 2nd, 2021

04:00 – 06:00 p.m. CET

TalklkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg

Friday, September 3rd, 2021

04:00 – 06:00 p.m. CET

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller

04:00 – 06:00 p.m. CET

NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

Apple

via Apple at Interspeech 2021

Conference Accepted Papers

A Discriminative Entity Aware Language Model forAssistants

Mandana Saebi, Ernie Pusateri, Aaksha Meghawat, Christophe Van Gysel

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition inAssistants

Deepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

Talks and Workshops

Meet Apple will be an opportunity to learn more about our ML teams, working at Apple, and how to apply to full-time positions. This talk will be held virtually on Wednesday, September 1 at 9:30 am PDT.

Apple is hosting a panel on internships, where attendees can learn more about internship opportunities across our machine learning teams. It will be held virtually on September 2 at 9:30 am PDT.

All registered Interspeech attendees are invited to each event. Check back for more information on how to join.

Affinity Events

Sunday August 29, 2021

Apple is a sponsor of the Workshop for Young Female Researchers in Speech Science & Technology which will take place virtually on Sunday, August 29.

Thursday, September 2nd, 2021

Matthias Paulik will be participating in the 8th Students Meet Experts event as a panelist. This event will take place virtually on Thursday, September 2.

Amazon

via Amazon Science

Accepted Publications

A learned conditional prior for the VAE acoustic space of a TTS system

Penny Karanasou, Sri Karlapati, Alexis Moinet, Arnaud Joly, Ammar Abbas, Simon Slangen, Jaime Lorenzo-Trueba, Thomas Drugman

Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition

Haoqi Li, Yelin Kim, Cheng-hao Kuo, Shrikanth Narayanan

Adapting long context NLM for ASR rescoring in conversational agents

Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

Adjunct-emeritus distillation for semi-supervised language model adaptation

Scott Novotney, Yile Gu, Ivan Bulyko

Amortized neural networks for low-latency speech recognition

Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow

Best of both worlds: Robust accented speech recognition with adversarial transfer learning

Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau

Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio

Manuel Giollo, Deniz Gunceler, Yulan Liu, Daniel Willett

CoDERT: Distilling encoder representations with co-learning for transducer-based speech recognition

Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

Correcting automated and manual speech transcription errors using warped language models

Mahdi Namazifar, John Malik, Erran Li, Gokhan Tur, Dilek Hakkani-Tür

Detection of lexical stress errors in non-native (L2) English with data augmentation and attention

Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Jasha Droppo, Thomas Drugman, Bozena Kostek

End-to-end neural diarization: From Transformer to Conformer

Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

End-to-end spoken language understanding for generalized voice assistants

Michael Saxon, Samridhi Choudhary, Joseph McKenna, Athanasios Mouchtaris

Evaluating the vulnerability of end-to-end automatic speech recognition models to membership inference attacks

Muhammad A. Shah, Joseph Szurley, Markus Mueller, Athanasios Mouchtaris

Event specific attention for polyphonic sound event detection

Harshavardhan Sundar, Ming Sun, Chao Wang

Factorization-aware training of transformers for natural language understanding on the edge

Hamidreza Saghir, Samridhi Choudhary, Sepehr Eghbali, Clement Chung

FANS: Fusing ASR and NLU for on-device SLU

Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

Fusion of embeddings networks for robust combination of text dependent and independent speaker recognition

Ruirui Li, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke

Graph-based label propagation for semi-supervised speaker identification

Long Chen, Venkatesh Ravichandran, Andreas Stolcke

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flow

Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

Improving RNN-T ASR accuracy using context audio

Andreas Schwarz, Ilya Sklyar, Simon Wiesler

Improving the expressiveness of neural vocoding with non-affine normalizing flows

Adam Gabrys, Yunlong Jiao, Daniel Korzekwa, Roberto Barra-Chicote

Intra-sentential speaking rate control in neural text-to-speech for automatic dubbing

Mayank Sharma, Yogesh Virkar, Marcello Federico, Roberto Barra-Chicote, Robert Enyedi

Learning a neural diff for speech models Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

Leveraging ASR N-best in deep entity retrieval Haoyu Wang,

John Chen, Majid Laali, Jeff King, Kevin Durda, William M. Campbell, Yang Liu

Listen with intent: Improving speech recognition with audio-to-intent front-end

Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin

Multi-channel transformer transducer for speech recognition

Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

Paraphrase label alignment for voice application retrieval in spoken language understanding

Zheng Gao, Radhika Arava, Qian Hu, Xibin Gao, Thahir Mohamed, Wei Xiao, Mohamed AbdelHady

Personalized PercepNet: Real-time, low-complexity target voice separation and enhancement

Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy

Phonetically induced subwords for end-to-end speech recognition

Vasileios Papadourakis, Markus Mueller, Jing Liu, Athanasios Mouchtaris, Maurizio Omologo

Predicting temporal performance drop of deployed production spoken language understanding models

Quynh Ngoc Thi Do, Judith Gaspers, Daniil Sorokin, Patrick Lehnen

Scaling effect of self-supervised speech models

Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo

Scaling laws for acoustic models

Jasha Droppo, Oguz Elibol

SmallER: Scaling neural entity resolution for edge devices

Ross McGowan, Jinru Su, Vince DiCocco, Thejaswi Muniyappa, Grant P. Strimel

Speaker-conversation factorial designs for diarization error analysis

Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

SynthASR: Unlocking synthetic data for speech recognition

Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

The impact of intent distribution mismatch on semi-supervised spoken language understanding

Judith Gaspers, Quynh Ngoc Thi Do, Daniil Sorokin, Patrick Lehnen

Wav2vec-C: A self-supervised model for speech representation learning

Samik Sadhu, Di Hu, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

Weakly-supervised word-level pronunciation error detection in non-native English speech

The UX for the Interspeech 2021 virtual conference platform could be more straightforward.

Here’s a quick list of sessions speakers and papers planned for the 9/30/21, if you have some I’ve missed please email me rob@ this domain.

Microsoft

via: MSFTResearch site, see also @MSFTResearch

Tuesday, August 31, 2021

13:30 15:30

Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks  Soham Deshmukh, Bhiksha Raj, Rita Singh

13:30 15:30

Explaining Deep Learning Models for Speech Enhancement  Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr

13:30 15:30

Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS  Yan Deng, Rui Zhao, Zhong Meng, Xie Chen, Bing Liu, Jinyu Li, Yifan Gong, Lei He

19:00 21:00

Data Augmentation for Spoken Language Understanding via Pretrained Language Models  Baolin Peng, Chenguang Zhu, Michael Zeng, Jianfeng Gao

19:00 21:00

Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need  Yan Huang, Guoli Ye, Jinyu Li, Yifan Gong

19:00 21:00

One-Shot Voice Conversion with Speaker-Agnostic StarGAN  Sefik Emre Eskimez, Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr

Wednesday, September 1, 2021

11:00 13:00

Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems  Vikas Joshi, Amit Das, Eric Sun, Rupesh Mehta, Jinyu Li, Yifan Gong

11:00 13:00

Streaming Multi-Talker Speech Recognition with Joint Speaker Identification  Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

16:00 18:00

A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems  Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li

16:00 18:00

Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing  Babak Naderi, Ross Cutler

19:00 21:00

MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages  Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

19:00 21:00

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition  Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

19:00 21:00

Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement  Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka

19:00 21:00Single-Channel Speech Enhancement Using Learnable Loss Mixup  Oscar Chang, Dung N. Tran, Kazuhito Koishida

19:00 21:00

INTERSPEECH 2021 Deep Noise Suppression Challenge  Chandan K A Reddy, Hari Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan

Thursday, September 2, 2021

11:00 13:00

Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder  Young-Kyung Kim, Rimita Lahiri, Md Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan

11:00 13:00Source Separation I

Related Publications

16:00 18:00Multi- and cross-lingual ASR, other topics in ASR

Related Publications

16:00 18:00

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker  Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe

16:00 18:00

Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration  Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng

16:00 18:00

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario  Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen

Friday, September 3, 2021

11:00 13:00

Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models  Amber Afshan, Kshitiz Kumar, Jian Wu

16:00 18:00

End-to-End Speaker-Attributed ASR with Transformer  Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka

16:00 18:00

Speech Synthesis: Speaking Style and Emotion

Related Publications

16:00 18:00

INTERSPEECH 2021 Acoustic Echo Cancellation Challenge

Ross Cutler, Ando Saabas, Tanel Panarmaa, Markus Loide, Sten Sootla, Marju Purin, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan

Monday, September 6, 2021

Workshop 11:00 Workshop on Machine Learning in Speech and Language Processing 2021

Speaker: Chengyi Wang (Intern)
Organizing Committee: Yao Qian
Scientific Committee: Liang Lu

NVIDIA

via NVIDIA at INTERSPEECH 2021 site,

Tuesday, August 31, 2021

07:00 – 09:00 p.m. CET

Scene-Agnostic Multi-Microphone Speech Dereverberation
Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot

Wednesday, September 1, 2021

11:00 a.m. – 01:00 p.m. CET

SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

07:00 – 09:00 p.m. CET

Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang

Thursday, September 2nd, 2021

04:00 – 06:00 p.m. CET

TalklkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg

Friday, September 3rd, 2021

04:00 – 06:00 p.m. CET

Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller

04:00 – 06:00 p.m. CET

NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg

Apple

via Apple at Interspeech 2021

Conference Accepted Papers

A Discriminative Entity Aware Language Model forAssistants

Mandana Saebi, Ernie Pusateri, Aaksha Meghawat, Christophe Van Gysel

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham

DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition inAssistants

Deepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

Talks and Workshops

Meet Apple will be an opportunity to learn more about our ML teams, working at Apple, and how to apply to full-time positions. This talk will be held virtually on Wednesday, September 1 at 9:30 am PDT.

Apple is hosting a panel on internships, where attendees can learn more about internship opportunities across our machine learning teams. It will be held virtually on September 2 at 9:30 am PDT.

All registered Interspeech attendees are invited to each event. Check back for more information on how to join.

Affinity Events

Sunday August 29, 2021

Apple is a sponsor of the Workshop for Young Female Researchers in Speech Science & Technology which will take place virtually on Sunday, August 29.

Thursday, September 2nd, 2021

Matthias Paulik will be participating in the 8th Students Meet Experts event as a panelist. This event will take place virtually on Thursday, September 2.

Amazon

via Amazon Science

Accepted Publications

A learned conditional prior for the VAE acoustic space of a TTS system

Penny Karanasou, Sri Karlapati, Alexis Moinet, Arnaud Joly, Ammar Abbas, Simon Slangen, Jaime Lorenzo-Trueba, Thomas Drugman

Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition

Haoqi Li, Yelin Kim, Cheng-hao Kuo, Shrikanth Narayanan

Adapting long context NLM for ASR rescoring in conversational agents

Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

Adjunct-emeritus distillation for semi-supervised language model adaptation

Scott Novotney, Yile Gu, Ivan Bulyko

Amortized neural networks for low-latency speech recognition

Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow

Best of both worlds: Robust accented speech recognition with adversarial transfer learning

Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau

Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio

Manuel Giollo, Deniz Gunceler, Yulan Liu, Daniel Willett

CoDERT: Distilling encoder representations with co-learning for transducer-based speech recognition

Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

Correcting automated and manual speech transcription errors using warped language models

Mahdi Namazifar, John Malik, Erran Li, Gokhan Tur, Dilek Hakkani-Tür

Detection of lexical stress errors in non-native (L2) English with data augmentation and attention

Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Jasha Droppo, Thomas Drugman, Bozena Kostek

End-to-end neural diarization: From Transformer to Conformer

Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke

End-to-end spoken language understanding for generalized voice assistants

Michael Saxon, Samridhi Choudhary, Joseph McKenna, Athanasios Mouchtaris

Evaluating the vulnerability of end-to-end automatic speech recognition models to membership inference attacks

Muhammad A. Shah, Joseph Szurley, Markus Mueller, Athanasios Mouchtaris

Event specific attention for polyphonic sound event detection

Harshavardhan Sundar, Ming Sun, Chao Wang

Factorization-aware training of transformers for natural language understanding on the edge

Hamidreza Saghir, Samridhi Choudhary, Sepehr Eghbali, Clement Chung

FANS: Fusing ASR and NLU for on-device SLU

Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

Fusion of embeddings networks for robust combination of text dependent and independent speaker recognition

Ruirui Li, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke

Graph-based label propagation for semi-supervised speaker identification

Long Chen, Venkatesh Ravichandran, Andreas Stolcke

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flow

Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

Improving RNN-T ASR accuracy using context audio

Andreas Schwarz, Ilya Sklyar, Simon Wiesler

Improving the expressiveness of neural vocoding with non-affine normalizing flows

Adam Gabrys, Yunlong Jiao, Daniel Korzekwa, Roberto Barra-Chicote

Intra-sentential speaking rate control in neural text-to-speech for automatic dubbing

Mayank Sharma, Yogesh Virkar, Marcello Federico, Roberto Barra-Chicote, Robert Enyedi

Learning a neural diff for speech models Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

Leveraging ASR N-best in deep entity retrieval Haoyu Wang,

John Chen, Majid Laali, Jeff King, Kevin Durda, William M. Campbell, Yang Liu

Listen with intent: Improving speech recognition with audio-to-intent front-end

Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

Multi-channel Opus compression for far-field automatic speech recognition with a fixed bitrate budget

Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin

Multi-channel transformer transducer for speech recognition

Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

Paraphrase label alignment for voice application retrieval in spoken language understanding

Zheng Gao, Radhika Arava, Qian Hu, Xibin Gao, Thahir Mohamed, Wei Xiao, Mohamed AbdelHady

Personalized PercepNet: Real-time, low-complexity target voice separation and enhancement

Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy

Phonetically induced subwords for end-to-end speech recognition

Vasileios Papadourakis, Markus Mueller, Jing Liu, Athanasios Mouchtaris, Maurizio Omologo

Predicting temporal performance drop of deployed production spoken language understanding models

Quynh Ngoc Thi Do, Judith Gaspers, Daniil Sorokin, Patrick Lehnen

Scaling effect of self-supervised speech models

Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo

Scaling laws for acoustic models

Jasha Droppo, Oguz Elibol

SmallER: Scaling neural entity resolution for edge devices

Ross McGowan, Jinru Su, Vince DiCocco, Thejaswi Muniyappa, Grant P. Strimel

Speaker-conversation factorial designs for diarization error analysis

Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

SynthASR: Unlocking synthetic data for speech recognition

Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

The impact of intent distribution mismatch on semi-supervised spoken language understanding

Judith Gaspers, Quynh Ngoc Thi Do, Daniil Sorokin, Patrick Lehnen

Wav2vec-C: A self-supervised model for speech representation learning

Samik Sadhu, Di Hu, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

Weakly-supervised word-level pronunciation error detection in non-native English speech

Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek

Visit the Amazon Science page for more info on eight workshops by Amazon.

  • ByteDance ?
  • Facebook ?
  • Google ?
  • Phonexia ?
  • 3M ?
  • Baidu ?
  • Human Language Technology (Johns Hopkins) ?
  • IBM?

wp:paragraph –>

Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek

Visit the Amazon Science page for more info on eight workshops by Amazon.

  • ByteDance ?
  • Facebook ?
  • Google ?
  • Phonexia ?
  • 3M ?
  • Baidu ?
  • Human Language Technology (Johns Hopkins) ?
  • IBM?

Leave a comment

Your email address will not be published. Required fields are marked *