The UX for the Interspeech 2021 virtual conference platform could be more straightforward.
Here’s a quick list of sessions speakers and papers planned for the 9/30/21.
Microsoft
via: MSFTResearch site, see also @MSFTResearch
Tuesday, August 31, 2021
13:30 15:30
Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks Soham Deshmukh, Bhiksha Raj, Rita Singh
13:30 15:30
Explaining Deep Learning Models for Speech Enhancement Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr
13:30 15:30
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS Yan Deng, Rui Zhao, Zhong Meng, Xie Chen, Bing Liu, Jinyu Li, Yifan Gong, Lei He
19:00 21:00
Data Augmentation for Spoken Language Understanding via Pretrained Language Models Baolin Peng, Chenguang Zhu, Michael Zeng, Jianfeng Gao
19:00 21:00
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need Yan Huang, Guoli Ye, Jinyu Li, Yifan Gong
19:00 21:00
One-Shot Voice Conversion with Speaker-Agnostic StarGAN Sefik Emre Eskimez, Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr
Wednesday, September 1, 2021
11:00 13:00
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems Vikas Joshi, Amit Das, Eric Sun, Rupesh Mehta, Jinyu Li, Yifan Gong
11:00 13:00
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
16:00 18:00
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li
16:00 18:00
Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing Babak Naderi, Ross Cutler
19:00 21:00
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham
19:00 21:00
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong
19:00 21:00
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
19:00 21:00Single-Channel Speech Enhancement Using Learnable Loss Mixup Oscar Chang, Dung N. Tran, Kazuhito Koishida
19:00 21:00
INTERSPEECH 2021 Deep Noise Suppression Challenge Chandan K A Reddy, Hari Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
Thursday, September 2, 2021
11:00 13:00
Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder Young-Kyung Kim, Rimita Lahiri, Md Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan
11:00 13:00Source Separation I
Related Publications
- Ultra Fast Speech Separation Model with Teacher Student Learning Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
- Investigation of Practical Aspects of Single Channel Speech Separation for ASR Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li
- Continuous Speech Separation Using Speaker Inventory for Long Recording Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
16:00 18:00Multi- and cross-lingual ASR, other topics in ASR
Related Publications
- Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki
- Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
- On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong
- Improving Multilingual Transformer Transducer Models by Reducing Language Confusions Eric Sun, Jinyu Li, Zhong Meng, Yu Wu, Jian Xue, Shujie Liu, Yifan Gong
16:00 18:00
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
16:00 18:00
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
16:00 18:00
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen
Friday, September 3, 2021
11:00 13:00
Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models Amber Afshan, Kshitiz Kumar, Jian Wu
16:00 18:00
End-to-End Speaker-Attributed ASR with Transformer Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
16:00 18:00
Speech Synthesis: Speaking Style and Emotion
Related Publications
- Adaptive Text to Speech for Spontaneous Style Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu
- Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan, Lei He
- Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An, Frank Soong, Lei Xie
16:00 18:00
INTERSPEECH 2021 Acoustic Echo Cancellation Challenge
Ross Cutler, Ando Saabas, Tanel Panarmaa, Markus Loide, Sten Sootla, Marju Purin, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan
Monday, September 6, 2021
Workshop 11:00 Workshop on Machine Learning in Speech and Language Processing 2021
Speaker: Chengyi Wang (Intern)
Organizing Committee: Yao Qian
Scientific Committee: Liang Lu
NVIDIA
via NVIDIA at INTERSPEECH 2021 site,
Tuesday, August 31, 2021
07:00 – 09:00 p.m. CET
Scene-Agnostic Multi-Microphone Speech Dereverberation
Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot
Wednesday, September 1, 2021
11:00 a.m. – 01:00 p.m. CET
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
07:00 – 09:00 p.m. CET
Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang
Thursday, September 2nd, 2021
04:00 – 06:00 p.m. CET
TalklkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg
Friday, September 3rd, 2021
04:00 – 06:00 p.m. CET
Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller
04:00 – 06:00 p.m. CET
NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
Apple
Conference Accepted Papers
A Discriminative Entity Aware Language Model forAssistants
Mandana Saebi, Ernie Pusateri, Aaksha Meghawat, Christophe Van Gysel
Analysis and Tuning of a Voice Assistant System for Dysfluent Speech
Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham
DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition inAssistants
Deepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation
Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir
Talks and Workshops
Meet Apple will be an opportunity to learn more about our ML teams, working at Apple, and how to apply to full-time positions. This talk will be held virtually on Wednesday, September 1 at 9:30 am PDT.
Apple is hosting a panel on internships, where attendees can learn more about internship opportunities across our machine learning teams. It will be held virtually on September 2 at 9:30 am PDT.
All registered Interspeech attendees are invited to each event. Check back for more information on how to join.
Affinity Events
Sunday August 29, 2021
Apple is a sponsor of the Workshop for Young Female Researchers in Speech Science & Technology which will take place virtually on Sunday, August 29.
Thursday, September 2nd, 2021
Matthias Paulik will be participating in the 8th Students Meet Experts event as a panelist. This event will take place virtually on Thursday, September 2.
Amazon
via Amazon Science
Accepted Publications
A learned conditional prior for the VAE acoustic space of a TTS system
Penny Karanasou, Sri Karlapati, Alexis Moinet, Arnaud Joly, Ammar Abbas, Simon Slangen, Jaime Lorenzo-Trueba, Thomas Drugman
Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition
Haoqi Li, Yelin Kim, Cheng-hao Kuo, Shrikanth Narayanan
Adapting long context NLM for ASR rescoring in conversational agents
Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff
Adjunct-emeritus distillation for semi-supervised language model adaptation
Scott Novotney, Yile Gu, Ivan Bulyko
Amortized neural networks for low-latency speech recognition
Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow
Best of both worlds: Robust accented speech recognition with adversarial transfer learning
Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau
Manuel Giollo, Deniz Gunceler, Yulan Liu, Daniel Willett
CoDERT: Distilling encoder representations with co-learning for transducer-based speech recognition
Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris
Correcting automated and manual speech transcription errors using warped language models
Mahdi Namazifar, John Malik, Erran Li, Gokhan Tur, Dilek Hakkani-Tür
Detection of lexical stress errors in non-native (L2) English with data augmentation and attention
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Jasha Droppo, Thomas Drugman, Bozena Kostek
End-to-end neural diarization: From Transformer to Conformer
Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke
End-to-end spoken language understanding for generalized voice assistants
Michael Saxon, Samridhi Choudhary, Joseph McKenna, Athanasios Mouchtaris
Muhammad A. Shah, Joseph Szurley, Markus Mueller, Athanasios Mouchtaris
Event specific attention for polyphonic sound event detection
Harshavardhan Sundar, Ming Sun, Chao Wang
Factorization-aware training of transformers for natural language understanding on the edge
Hamidreza Saghir, Samridhi Choudhary, Sepehr Eghbali, Clement Chung
FANS: Fusing ASR and NLU for on-device SLU
Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow
Ruirui Li, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke
Graph-based label propagation for semi-supervised speaker identification
Long Chen, Venkatesh Ravichandran, Andreas Stolcke
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flow
Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo
Improving RNN-T ASR accuracy using context audio
Andreas Schwarz, Ilya Sklyar, Simon Wiesler
Improving the expressiveness of neural vocoding with non-affine normalizing flows
Adam Gabrys, Yunlong Jiao, Daniel Korzekwa, Roberto Barra-Chicote
Intra-sentential speaking rate control in neural text-to-speech for automatic dubbing
Mayank Sharma, Yogesh Virkar, Marcello Federico, Roberto Barra-Chicote, Robert Enyedi
Learning a neural diff for speech models Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow
Leveraging ASR N-best in deep entity retrieval Haoyu Wang,
John Chen, Majid Laali, Jeff King, Kevin Durda, William M. Campbell, Yang Liu
Listen with intent: Improving speech recognition with audio-to-intent front-end
Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo
Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin
Multi-channel transformer transducer for speech recognition
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo
Paraphrase label alignment for voice application retrieval in spoken language understanding
Zheng Gao, Radhika Arava, Qian Hu, Xibin Gao, Thahir Mohamed, Wei Xiao, Mohamed AbdelHady
Personalized PercepNet: Real-time, low-complexity target voice separation and enhancement
Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy
Phonetically induced subwords for end-to-end speech recognition
Vasileios Papadourakis, Markus Mueller, Jing Liu, Athanasios Mouchtaris, Maurizio Omologo
Predicting temporal performance drop of deployed production spoken language understanding models
Quynh Ngoc Thi Do, Judith Gaspers, Daniil Sorokin, Patrick Lehnen
Scaling effect of self-supervised speech models
Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo
Scaling laws for acoustic models
SmallER: Scaling neural entity resolution for edge devices
Ross McGowan, Jinru Su, Vince DiCocco, Thejaswi Muniyappa, Grant P. Strimel
Speaker-conversation factorial designs for diarization error analysis
Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff
SynthASR: Unlocking synthetic data for speech recognition
Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo
The impact of intent distribution mismatch on semi-supervised spoken language understanding
Judith Gaspers, Quynh Ngoc Thi Do, Daniil Sorokin, Patrick Lehnen
Wav2vec-C: A self-supervised model for speech representation learning
Samik Sadhu, Di Hu, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
Weakly-supervised word-level pronunciation error detection in non-native English speech
The UX for the Interspeech 2021 virtual conference platform could be more straightforward.
Here’s a quick list of sessions speakers and papers planned for the 9/30/21, if you have some I’ve missed please email me rob@ this domain.
Microsoft
via: MSFTResearch site, see also @MSFTResearch
Tuesday, August 31, 2021
13:30 15:30
Improving Weakly Supervised Sound Event Detection with Self-Supervised Auxiliary Tasks Soham Deshmukh, Bhiksha Raj, Rita Singh
13:30 15:30
Explaining Deep Learning Models for Speech Enhancement Sunit Sivasankaran, Emmanuel Vincent, Dominique Fohr
13:30 15:30
Improving RNN-T for Domain Scaling Using Semi-Supervised Training with Neural TTS Yan Deng, Rui Zhao, Zhong Meng, Xie Chen, Bing Liu, Jinyu Li, Yifan Gong, Lei He
19:00 21:00
Data Augmentation for Spoken Language Understanding via Pretrained Language Models Baolin Peng, Chenguang Zhu, Michael Zeng, Jianfeng Gao
19:00 21:00
Rapid Speaker Adaptation for Conformer Transducer: Attention and Bias Are All You Need Yan Huang, Guoli Ye, Jinyu Li, Yifan Gong
19:00 21:00
One-Shot Voice Conversion with Speaker-Agnostic StarGAN Sefik Emre Eskimez, Dimitrios Dimitriadis, Kenichi Kumatani, Robert Gmyr
Wednesday, September 1, 2021
11:00 13:00
Multiple Softmax Architecture for Streaming Multilingual End-to-End ASR Systems Vikas Joshi, Amit Das, Eric Sun, Rupesh Mehta, Jinyu Li, Yifan Gong
11:00 13:00
Streaming Multi-Talker Speech Recognition with Joint Speaker Identification Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong
16:00 18:00
A Light-Weight Contextual Spelling Correction Model for Customizing Transducer-Based Speech Recognition Systems Xiaoqiang Wang, Yanqing Liu, Sheng Zhao, Jinyu Li
16:00 18:00
Subjective Evaluation of Noise Suppression Algorithms in Crowdsourcing Babak Naderi, Ross Cutler
19:00 21:00
MUCS 2021: Multilingual and Code-Switching ASR Challenges for Low Resource Indian Languages Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham
19:00 21:00
Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong
19:00 21:00
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
19:00 21:00Single-Channel Speech Enhancement Using Learnable Loss Mixup Oscar Chang, Dung N. Tran, Kazuhito Koishida
19:00 21:00
INTERSPEECH 2021 Deep Noise Suppression Challenge Chandan K A Reddy, Hari Dubey, Kazuhito Koishida, Arun Nair, Vishak Gopal, Ross Cutler, Sebastian Braun, Hannes Gamper, Robert Aichner, Sriram Srinivasan
Thursday, September 2, 2021
11:00 13:00
Analyzing Short Term Dynamic Speech Features for Understanding Behavioral Traits of Children with Autism Spectrum Disorder Young-Kyung Kim, Rimita Lahiri, Md Nasir, So Hyun Kim, Somer Bishop, Catherine Lord, Shrikanth Narayanan
11:00 13:00Source Separation I
Related Publications
- Ultra Fast Speech Separation Model with Teacher Student Learning Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu
- Investigation of Practical Aspects of Single Channel Speech Separation for ASR Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li
- Continuous Speech Separation Using Speaker Inventory for Long Recording Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
16:00 18:00Multi- and cross-lingual ASR, other topics in ASR
Related Publications
- Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki
- Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
- On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong
- Improving Multilingual Transformer Transducer Models by Reducing Language Confusions Eric Sun, Jinyu Li, Zhong Meng, Yu Wu, Jian Xue, Shujie Liu, Yifan Gong
16:00 18:00
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker Maokui He, Desh Raj, Zili Huang, Jun Du, Zhuo Chen, Shinji Watanabe
16:00 18:00
Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration Chuanxin Tang, Chong Luo, Zhiyuan Zhao, Dacheng Yin, Yucheng Zhao, Wenjun Zeng
16:00 18:00
AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario Yihui Fu, Luyao Cheng, Shubo Lv, Yukai Jv, Yuxiang Kong, Zhuo Chen, Yanxin Hu, Lei Xie, Jian Wu, Hui Bu, Xin Xu, Jun Du, Jingdong Chen
Friday, September 3, 2021
11:00 13:00
Sequence-Level Confidence Classifier for ASR Utterance Accuracy and Application to Acoustic Models Amber Afshan, Kshitiz Kumar, Jian Wu
16:00 18:00
End-to-End Speaker-Attributed ASR with Transformer Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka
16:00 18:00
Speech Synthesis: Speaking Style and Emotion
Related Publications
- Adaptive Text to Speech for Spontaneous Style Yuzi Yan, Xu Tan, Bohan Li, Guangyan Zhang, Tao Qin, Sheng Zhao, Yuan Shen, Wei-Qiang Zhang, Tie-Yan Liu
- Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan, Lei He
- Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An, Frank Soong, Lei Xie
16:00 18:00
INTERSPEECH 2021 Acoustic Echo Cancellation Challenge
Ross Cutler, Ando Saabas, Tanel Panarmaa, Markus Loide, Sten Sootla, Marju Purin, Hannes Gamper, Sebastian Braun, Robert Aichner, Sriram Srinivasan
Monday, September 6, 2021
Workshop 11:00 Workshop on Machine Learning in Speech and Language Processing 2021
Speaker: Chengyi Wang (Intern)
Organizing Committee: Yao Qian
Scientific Committee: Liang Lu
NVIDIA
via NVIDIA at INTERSPEECH 2021 site,
Tuesday, August 31, 2021
07:00 – 09:00 p.m. CET
Scene-Agnostic Multi-Microphone Speech Dereverberation
Yochai Yemini, Ethan Fetaya, Haggai Maron, Sharon Gannot
Wednesday, September 1, 2021
11:00 a.m. – 01:00 p.m. CET
SPGISpeech: 5,000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition
Patrick K. O’Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko
07:00 – 09:00 p.m. CET
Hi-Fi Multi-Speaker English TTS Dataset
Evelina Bakhturina, Vitaly Lavrukhin, Boris Ginsburg, Yang Zhang
Thursday, September 2nd, 2021
04:00 – 06:00 p.m. CET
TalklkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev, Boris Ginsburg
Friday, September 3rd, 2021
04:00 – 06:00 p.m. CET
Compressing 1D Time-Channel Separable Convolutions Using Sparse Random Ternary Matrices
Gonçalo Mordido, Matthijs Van Keirsbilck, Alexander Keller
04:00 – 06:00 p.m. CET
NeMo Inverse Text Normalization: From Development To Production
Yang Zhang, Evelina Bakhturina, Kyle Gorman, Boris Ginsburg
Apple
Conference Accepted Papers
A Discriminative Entity Aware Language Model forAssistantsMandana Saebi, Ernie Pusateri, Aaksha Meghawat, Christophe Van Gysel
Analysis and Tuning of a Voice Assistant System for Dysfluent SpeechVikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Panayiotis Georgiou, Sachin Kajarekar, Jefferey Bigham
DEXTER: Deep Encoding of External Knowledge for Named Entity Recognition inAssistantsDeepak Muralidharan, Joel Ruben Antony Moniz, Weicheng Zhang, Stephen Pulman, Lin Li, Megan Barnes, Jingjing Pan, Jason Williams, Alex Acero
Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger MitigationVineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir
Talks and Workshops
Meet Apple will be an opportunity to learn more about our ML teams, working at Apple, and how to apply to full-time positions. This talk will be held virtually on Wednesday, September 1 at 9:30 am PDT.
Apple is hosting a panel on internships, where attendees can learn more about internship opportunities across our machine learning teams. It will be held virtually on September 2 at 9:30 am PDT.
All registered Interspeech attendees are invited to each event. Check back for more information on how to join.
Affinity Events
Sunday August 29, 2021
Apple is a sponsor of the Workshop for Young Female Researchers in Speech Science & Technology which will take place virtually on Sunday, August 29.
Thursday, September 2nd, 2021
Matthias Paulik will be participating in the 8th Students Meet Experts event as a panelist. This event will take place virtually on Thursday, September 2.
Amazon
via Amazon Science
Accepted Publications
A learned conditional prior for the VAE acoustic space of a TTS system
Penny Karanasou, Sri Karlapati, Alexis Moinet, Arnaud Joly, Ammar Abbas, Simon Slangen, Jaime Lorenzo-Trueba, Thomas Drugman
Acted vs. improvised: Domain adaptation for elicitation approaches in audio-visual emotion recognition
Haoqi Li, Yelin Kim, Cheng-hao Kuo, Shrikanth Narayanan
Adapting long context NLM for ASR rescoring in conversational agents
Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff
Adjunct-emeritus distillation for semi-supervised language model adaptation
Scott Novotney, Yile Gu, Ivan Bulyko
Amortized neural networks for low-latency speech recognition
Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow
Best of both worlds: Robust accented speech recognition with adversarial transfer learning
Nilaksh Das, Sravan Bodapati, Monica Sunkara, Sundararajan Srinivasan, Duen Horng Chau
Manuel Giollo, Deniz Gunceler, Yulan Liu, Daniel Willett
CoDERT: Distilling encoder representations with co-learning for transducer-based speech recognition
Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris
Correcting automated and manual speech transcription errors using warped language models
Mahdi Namazifar, John Malik, Erran Li, Gokhan Tur, Dilek Hakkani-Tür
Detection of lexical stress errors in non-native (L2) English with data augmentation and attention
Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Jasha Droppo, Thomas Drugman, Bozena Kostek
End-to-end neural diarization: From Transformer to Conformer
Yi Chieh Liu, Eunjung Han, Chul Lee, Andreas Stolcke
End-to-end spoken language understanding for generalized voice assistants
Michael Saxon, Samridhi Choudhary, Joseph McKenna, Athanasios Mouchtaris
Muhammad A. Shah, Joseph Szurley, Markus Mueller, Athanasios Mouchtaris
Event specific attention for polyphonic sound event detection
Harshavardhan Sundar, Ming Sun, Chao Wang
Factorization-aware training of transformers for natural language understanding on the edge
Hamidreza Saghir, Samridhi Choudhary, Sepehr Eghbali, Clement Chung
FANS: Fusing ASR and NLU for on-device SLU
Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow
Ruirui Li, Chelsea J.-T. Ju, Zeya Chen, Hongda Mao, Oguz Elibol, Andreas Stolcke
Graph-based label propagation for semi-supervised speaker identification
Long Chen, Venkatesh Ravichandran, Andreas Stolcke
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flow
Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo
Improving RNN-T ASR accuracy using context audio
Andreas Schwarz, Ilya Sklyar, Simon Wiesler
Improving the expressiveness of neural vocoding with non-affine normalizing flows
Adam Gabrys, Yunlong Jiao, Daniel Korzekwa, Roberto Barra-Chicote
Intra-sentential speaking rate control in neural text-to-speech for automatic dubbing
Mayank Sharma, Yogesh Virkar, Marcello Federico, Roberto Barra-Chicote, Robert Enyedi
Learning a neural diff for speech models Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow
Leveraging ASR N-best in deep entity retrieval Haoyu Wang,
John Chen, Majid Laali, Jeff King, Kevin Durda, William M. Campbell, Yang Liu
Listen with intent: Improving speech recognition with audio-to-intent front-end
Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo
Lukas Drude, Jahn Heymann, Andreas Schwarz, Jean-Marc Valin
Multi-channel transformer transducer for speech recognition
Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo
Paraphrase label alignment for voice application retrieval in spoken language understanding
Zheng Gao, Radhika Arava, Qian Hu, Xibin Gao, Thahir Mohamed, Wei Xiao, Mohamed AbdelHady
Personalized PercepNet: Real-time, low-complexity target voice separation and enhancement
Ritwik Giri, Shrikant Venkataramani, Jean-Marc Valin, Umut Isik, Arvindh Krishnaswamy
Phonetically induced subwords for end-to-end speech recognition
Vasileios Papadourakis, Markus Mueller, Jing Liu, Athanasios Mouchtaris, Maurizio Omologo
Predicting temporal performance drop of deployed production spoken language understanding models
Quynh Ngoc Thi Do, Judith Gaspers, Daniil Sorokin, Patrick Lehnen
Scaling effect of self-supervised speech models
Jie Pu, Yuguang Yang, Ruirui Li, Oguz Elibol, Jasha Droppo
Scaling laws for acoustic models
SmallER: Scaling neural entity resolution for edge devices
Ross McGowan, Jinru Su, Vince DiCocco, Thejaswi Muniyappa, Grant P. Strimel
Speaker-conversation factorial designs for diarization error analysis
Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff
SynthASR: Unlocking synthetic data for speech recognition
Amin Fazel, Wei Yang, Yulan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo
The impact of intent distribution mismatch on semi-supervised spoken language understanding
Judith Gaspers, Quynh Ngoc Thi Do, Daniil Sorokin, Patrick Lehnen
Wav2vec-C: A self-supervised model for speech representation learning
Samik Sadhu, Di Hu, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas
Weakly-supervised word-level pronunciation error detection in non-native English speech
Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek
Visit the Amazon Science page for more info on eight workshops by Amazon.
- ByteDance ?
- Facebook ?
- Google ?
- Phonexia ?
- 3M ?
- Baidu ?
- Human Language Technology (Johns Hopkins) ?
- IBM?
wp:paragraph –>
Daniel Korzekwa, Jaime Lorenzo-Trueba, Thomas Drugman, Shira Calamaro, Bozena Kostek
Visit the Amazon Science page for more info on eight workshops by Amazon.
- ByteDance ?
- Facebook ?
- Google ?
- Phonexia ?
- 3M ?
- Baidu ?
- Human Language Technology (Johns Hopkins) ?
- IBM?