Curriculum Vitae
Education
Ph.D. in Speech Processing (2014-2019)
Northwestern Polytechnical University, Xi’an, China
Thesis: Advanced Techniques in Speech Signal Processing and Recognition
B.S. in Computer Science (2010-2014)
University Name, City, Country
Graduated with Honors
Professional Experience
Tencent (2020-Present)
Senior AI Algorithm Engineer
- Lead development of multimodal AI systems integrating text, vision, and audio processing
- Design and implement large-scale language models for various applications
- Optimize speech processing algorithms for real-time applications
- Collaborate with cross-functional teams to deploy AI solutions in production environments
National University of Singapore (2017-2018)
Joint-Trained Ph.D. Program
- Conducted research on advanced speech processing techniques
- Collaborated with international research teams on multimodal learning projects
- Published research findings in top-tier conferences
Institute for Infocomm Research (2016-2017)
Research Intern
- Developed algorithms for audio signal processing and analysis
- Contributed to research projects in speech recognition and synthesis
- Gained experience in academic research methodologies
Research Interests
- Large Language Models (LLMs): Development, optimization, and applications
- Vision-Language Models (VLLMs): Multimodal understanding and generation
- Speech Processing: Recognition, synthesis, and enhancement
- Multimodal Learning: Integration of visual, textual, and audio modalities
- Clustering & Retrieval: Advanced algorithms for information organization
Technical Skills
Programming Languages
- Python: Advanced (PyTorch, TensorFlow, NumPy, Pandas)
- C++: Intermediate (System programming, performance optimization)
- Bash: Intermediate (Scripting, automation, system administration)
AI/ML Frameworks
- Deep Learning: PyTorch, TensorFlow, Keras
- NLP: Transformers, Hugging Face, spaCy
- Computer Vision: OpenCV, PIL, scikit-image
- Audio Processing: librosa, pydub, soundfile
- Version Control: Git, GitHub, GitLab
- Containerization: Docker, Kubernetes
- Cloud Platforms: AWS, Azure, Tencent Cloud
- MLOps: MLflow, Kubeflow, Airflow
Publications
Peiwen Sun, Shanshan Zhang, Zishan Liu, Yougen Yuan, Taotao Zhang, Honggang Zhang, Pengfei Hu, "A Method of Audio-Visual Person Verification by Mining Connections between Time Series." Proc. INTERSPEECH, 2023.
Yougen Yuan, Zhiqiang Lv, Shen Huang, Pengfei Hu, "VRM-Phase I VKW system description of long-short video customizable keyword wakeup challenge." arXiv preprint arXiv:2110.15316, 2021.
Yougen Yuan, Lei Xie, Cheung-Chi Leung, Hongjie Chen, Bin Ma, "Fast query-by-example speech search using attention-based deep binary embeddings." IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020.
Yougen Yuan, Zhiqiang Lv, Shen Huang, Lei Xie, "Verifying deep keyword spotting detection with acoustic word embeddings." 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2019.
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, "Query-by-example speech search using recurrent neural acoustic word embeddings with temporal context." IEEE Access, 2019.
Yougen Yuan, Wei Tang, Minhao Fan, Yue Cao, Peng Zhang, Lei Xie, "Deep audio-visual system for closed-set word-level speech recognition." 2019 International Conference on Multimodal Interaction, 2019.
Yanping Li, Kong Aik Lee, Yougen Yuan, Haizhou Li, Zhen Yang, "Many-to-many voice conversion based on bottleneck features with variational autoencoder for non-parallel training data." Proc. APSIPA ASC, 2018.
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search." Proc. INTERSPEECH, 2018.
Yougen Yuan, Lei Xie, Zhonghua Fu, Ming Xu, Qi Cong, "Sound Image Externalization for Headphone based Real time 3D Audio." Frontiers of Computer Science, 2017.
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Pairwise Learning using Multi-lingual Bottleneck Features for Low-resource Query-by-example Spoken Term Detection." Proc. ICASSP, 2017.
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li, "Extracting Bottleneck Features and Word-like Pairs from Untranscribed Speech for Feature Representation." Proc. ASRU, 2017.
Yougen Yuan, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li, "Learning Neural Network Representation using Cross-lingual Bottleneck Features with Word-pair Information." Proc. INTERSPEECH, 2016.
Bihong Zhang, Lei Xie, Yougen Yuan, Huaiping Ming, Dongyan Huang, Mingli Song, "Deep neural network derived bottleneck features for accurate audio classification." Proc. ICMEW, 2016.
Yougen Yuan, Zhonghua Fu, Ming Xu, Lei Xie, Qi Cong, "Externalization Improvement in a Real-time Binaural Sound Image Rendering System." Proc. ICOT, 2015.
Shaofei Zhang, Lei Xie, Zhong-Hua Fu, Yougen Yuan, "A Hybrid Virtual Bass System with Improved Phase Vocoder and High Efficiency." Proc. ISCSLP, 2014.
Talks & Presentations
Teaching & Mentoring
Professional Memberships
- Member, Association for Computing Machinery (ACM)
- Member, Institute of Electrical and Electronics Engineers (IEEE)
- Member, International Speech Communication Association (ISCA)
Languages
- Chinese: Native
- English: Professional Proficiency
Last updated: March 2026