Curriculum Vitae

Education

Ph.D. in Speech Processing (2014-2019)
Northwestern Polytechnical University, Xi’an, China
Thesis: Advanced Techniques in Speech Signal Processing and Recognition
B.S. in Computer Science (2010-2014)
University Name, City, Country
Graduated with Honors

Professional Experience

Tencent (2020-Present)

Senior AI Algorithm Engineer

Lead development of multimodal AI systems integrating text, vision, and audio processing
Design and implement large-scale language models for various applications
Optimize speech processing algorithms for real-time applications
Collaborate with cross-functional teams to deploy AI solutions in production environments

National University of Singapore (2017-2018)

Joint-Trained Ph.D. Program

Conducted research on advanced speech processing techniques
Collaborated with international research teams on multimodal learning projects
Published research findings in top-tier conferences

Institute for Infocomm Research (2016-2017)

Research Intern

Developed algorithms for audio signal processing and analysis
Contributed to research projects in speech recognition and synthesis
Gained experience in academic research methodologies

Research Interests

Large Language Models (LLMs): Development, optimization, and applications
Vision-Language Models (VLLMs): Multimodal understanding and generation
Speech Processing: Recognition, synthesis, and enhancement
Multimodal Learning: Integration of visual, textual, and audio modalities
Clustering & Retrieval: Advanced algorithms for information organization

Technical Skills

Programming Languages

Python: Advanced (PyTorch, TensorFlow, NumPy, Pandas)
C++: Intermediate (System programming, performance optimization)
Bash: Intermediate (Scripting, automation, system administration)

AI/ML Frameworks

Deep Learning: PyTorch, TensorFlow, Keras
NLP: Transformers, Hugging Face, spaCy
Computer Vision: OpenCV, PIL, scikit-image
Audio Processing: librosa, pydub, soundfile

Tools & Platforms

Version Control: Git, GitHub, GitLab
Containerization: Docker, Kubernetes
Cloud Platforms: AWS, Azure, Tencent Cloud
MLOps: MLflow, Kubeflow, Airflow

Publications

A Method of Audio-Visual Person Verification by Mining Connections between Time Series

Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu, "A Method of Audio-Visual Person Verification by Mining Connections between Time Series." Proc. INTERSPEECH, 2023.

VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge

Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu, "VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge." arXiv preprint arXiv:2110.15316, 2021.

Fast query-by-example speech search using attention-based deep binary embeddings

Yougen Yuan, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Bin Ma, "Fast query-by-example speech search using attention-based deep binary embeddings." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.

Verifying deep keyword spotting detection with acoustic word embeddings

Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie, "Verifying deep keyword spotting detection with acoustic word embeddings." 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019.

Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, "Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context." IEEE Access, 2019.

Deep audio-visual system for closed-set word-level speech recognition

Yougen Yuan, Wei Tang, Minhao Fan, Yue Cao, Peng Zhang, Lei Xie, "Deep audio-visual system for closed-set word-level speech recognition." 2019 International Conference on Multimodal Interaction, 2019.

Many-to-many voice conversion based on bottleneck features with variational autoencoder for non-parallel training data

Yanping Li, Kong Aik Lee, Yougen Yuan, Haizhou Li, Zhen Yang, "Many-to-many voice conversion based on bottleneck features with variational autoencoder for non-parallel training data." Proc. APSIPA ASC, 2018.

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search." Proc. INTERSPEECH, 2018.

Sound Image Externalization for Headphone based Real time 3D Audio

Yougen Yuan, Lei Xie, Zhonghua Fu, Ming Xu, Qi Cong, "Sound Image Externalization for Headphone based Real time 3D Audio." Frontiers of Computer Science, 2017.

Pairwise Learning using Multi-lingual Bottleneck Features for Low-resource Query-by-example Spoken Term Detection

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Pairwise Learning using Multi-lingual Bottleneck Features for Low-resource Query-by-example Spoken Term Detection." Proc. ICASSP, 2017.

Extracting Bottleneck Features and Word-like Pairs from Untranscribed Speech for Feature Representation

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Extracting Bottleneck Features and Word-like Pairs from Untranscribed Speech for Feature Representation." Proc. ASRU, 2017.

Learning Neural Network Representation using Cross-lingual Bottleneck Features with Word-pair Information

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li, "Learning Neural Network Representation using Cross-lingual Bottleneck Features with Word-pair Information." Proc. INTERSPEECH, 2016.

Deep neural network derived bottleneck features for accurate audio classification

Bihong Zhang, Lei Xie, Yougen Yuan, Huaiping Ming, Dongyan Huang, Mingli Song, "Deep neural network derived bottleneck features for accurate audio classification." Proc. ICMEW, 2016.

Externalization Improvement in a Real-time Binaural Sound Image Rendering System

Yougen Yuan, Zhonghua Fu, Ming Xu, Lei Xie, Qi Cong, "Externalization Improvement in a Real-time Binaural Sound Image Rendering System." Proc. ICOT, 2015.

A Hybrid Virtual Bass System with Improved Phase Vocoder and High Efficiency

Shaofei Zhang, Lei Xie, Zhong-Hua Fu, Yougen Yuan, "A Hybrid Virtual Bass System with Improved Phase Vocoder and High Efficiency." Proc. ISCSLP, 2014.

Talks & Presentations

Teaching & Mentoring

Professional Memberships

Member, Association for Computing Machinery (ACM)
Member, Institute of Electrical and Electronics Engineers (IEEE)
Member, International Speech Communication Association (ISCA)

Languages

Chinese: Native
English: Professional Proficiency

Last updated: March 2026