I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised by Prof. James Kwok. I hold a Master's degree in Computer Science from CUHK (2019) and a Bachelor's degree in Software Engineering from Tongji University (2018). I'm currently seeking job opportunities and would welcome you to connect with me via email.
My research focuses on: Multi-modal Foundation Models across different sub-directions:Full publication list on Google Scholar. (* denotes equal contribution)
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025
Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025 (Oral).
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning
IEEE Transactions on Image Processing (TIP), 2025.
Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts
International Conference on Learning Representations (ICLR), 2023 (Spotlight Top25%).
Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.
AAAI Conference on Artificial Intelligence (AAAI), 2022.
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Arxiv preprint, 2024.
Association for Computational Linguistics (ACL), 2025
Implicit Concept Removal of Diffusion Models
European Conference on Computer Vision (ECCV), 2024.
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
European Conference on Computer Vision (ECCV), 2024.
CUHK Entrance Scholarship & Distinguished Academic Performance Scholarship
Outstanding Graduate of Shanghai
Scholarship for Outstanding Student of Tongji University