Zhili Liu's Homepage

Email / CV / Github / Google Scholar

About Me

I am currently a Ph.D. candidate in CSE department of Hong Kong University of Science and Technology (HKUST), supervised by Prof. James Kwok. I hold a Master's degree in Computer Science from CUHK (2019) and a Bachelor's degree in Software Engineering from Tongji University (2018). I'm currently seeking job opportunities and would welcome you to connect with me via email.

My research focuses on: Multi-modal Foundation Models across different sub-directions:

Foundation models: EMOVA (Omni-Model), Val_PPL
Mixture of experts (MoE): MoCLE, MoCE, SDR
MLLM Reasoning: RACRO, Atomthink
Model Safety: MoTE, Geom-Erasing, ECSO

News

[2026.01] RACRO is accepted by ICLR 2026. See you in Rio de Janeiro, Brazil!
[2026.01] AtomThink is accepted by TPAMI.
[2025.10] MoCLE accepted by IEEE TIP 2025!
[2025.08] Val_PPL accepted by EMNLP 2025 (Oral)! See you in Suzhou!
[2025.05] MoTE is accepted by ACL 2025!
[2025.03] We open-source EMOVA, a frontier end-to-end Omni-modal LLM with SoTA vision-language and speech abilities, which has been accepted by CVPR 2025!
[2025.02] EMOVA is accepted by CVPR 2025!
[2024.09] We announce EMOVA, a novel end-to-end omni-modal model (i.e., w/o ASR or TTS) with SoTA vision-language and speech abilities, further supporting emotional dialogue. Stay tuned for more details after ECCV!
[2024.07] Implicit Concept Dataset is release on HuggingFace. Welcome to try!
[2024.07] Two papers Geom-Erasing and ECSO are accepted by ECCV 2024! See you in Milano, Italy!
[2023.12] Our MoCLE is reported by Liangziwei!
[2023.05] MoCE is reported by Noah's Ark Lab!
[2023.01] MoCE is accepted by ICLR 2023 as Spotlight Top25%! See you in Kigali, Rwanda!
[2022.05] SDR is accepted by AAAI 2022!

Selected Publications

Full publication list on Google Scholar. (* denotes equal contribution)

Multi-modal Foundation Models

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen*, Yunhao Gou*, Runhui Huang*, Zhili Liu*, Daxin Tan* and other 26 authors

Fully open-sourced Omni-modal LLMs with SoTA vision-language and speech abilities!

IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2025

[PDF] [Webpage] [Talk] [Talk (Chinese)] [Wechat Post] [Code] GitHub Repo stars

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Yunhao Gou, Hansi Yang, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, James T Kwok, Yu Zhang.

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025 (Oral).

[PDF]

Multi-modal Foundation Models: Mixture of Cluster-conditional Experts

Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou*, Zhili Liu*, Kai Chen*, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

First MLLM with MoE for instruction customization and generalization!

IEEE Transactions on Image Processing (TIP), 2025.

[PDF] [Webpage] [Talk] [Wechat Post] [Code] GitHub Repo stars

Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts

Zhili Liu*, Kai Chen*, Jianhua Han, Lanqing HONG, Hang Xu, Zhenguo Li, James Kwok.

International Conference on Learning Representations (ICLR), 2023 (Spotlight Top25%).

[PDF][Wechat Post]

Task-Customized Self-Supervised Pre-training with Scalable Dynamic Routing.

Zhili Liu, Jianhua Han, Kai Chen, Lanqing Hong, Hang Xu, Chunjing Xu, Zhenguo Li.

AAAI Conference on Artificial Intelligence (AAAI), 2022.

[PDF]

Multi-modal Foundation Models: MLLM Reasoning

Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

Scaling reasoning MLLMs via adopting any advanced LLM reasoners during inference time!

International Conference on Learning Representations (ICLR), 2026

[PDF] [Demo] [Code] GitHub Repo stars

AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning

Kun Xiang*, Zhili Liu*, Zihao Jiang*, Yunshuang Nie, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang

First o1-like slow think framework for MLLM!

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

[PDF][Github]

Multi-modal Foundation Models: Model Safety

Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

Zhili Liu*, Yunhao Gou*, Kai Chen*, Lanqing Hong, Jiahui Gao, Fei Mi, Yu Zhang, Zhenguo Li, Xin Jiang, Qun Liu, James T. Kwok

Self-alignment with MoE-empowered CoT multi-dimensional analysis!

Association for Computational Linguistics (ACL), 2025

[PDF]

Implicit Concept Removal of Diffusion Models

Zhili Liu*, Kai Chen*, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok.

Geometric-controllable concept eraser for diffusion models!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page] [Talk] [Huggingface]

Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou*, Kai Chen*, Zhili Liu*, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James Kwok, Yu Zhang.

Free data engine for MLLM alignment on its own LLM!

European Conference on Computer Vision (ECCV), 2024.

[PDF] [Project page]

Experiences

Huawei Noah's Ark Lab

Hong Kong, China

Intern @ AI theory Lab, supervised by Prof. Zhenguo Li

June 2019 - Now

The Chinese University of Hong Kong

Hong Kong, China

Research Assistant, supervised by Prof. Bei Yu

September 2018 - May 2019

Microsoft Research Asia

Beijing, China

Intern @ Intelligent Multimedia Group, supervised by Dr. Chong Luo

Janurary 2018 - July 2018

Selected Awards

CUHK Entrance Scholarship & Distinguished Academic Performance Scholarship

2019/2018

Outstanding Graduate of Shanghai

2018

Scholarship for Outstanding Student of Tongji University

2017/2016/2015

Zhili Liu (刘智立)

Ph.D. Candidate @ HKUST

Multi-modal Foundation Models

Multi-modal Foundation Models: Mixture of Cluster-conditional Experts

Multi-modal Foundation Models: MLLM Reasoning

Multi-modal Foundation Models: Model Safety