Ruimao Zhang's Homepage

Ruimao Zhang

Tenure-track Associate Professor ( Computer Vision, Multimedia, Embodied AI )

Spatial Artificial Intelligence Lab, SAIL

School of Electronics and Communication Engineering , Sun Yat-sen University (Shenzhen Campus)

Google Scholar , Twitter , Zhihu

News

The primary objective of our research team is to develop intelligent agents that can effectively collaborate with humans in dynamic environments. To realize this ambition, we focus on three core research directions. (1) Human-centered Visual Content Understanding and Reasoning: This area seeks to enable machines to actively perceive, analyze, and interpret human states, behaviors, and underlying motivations in dynamic scenarios. (2) Omni-modal Scene Perception and Navigation: This emphasizes harnessing diverse sensor modalities to comprehend and navigate complex scenes. (3) Machine Behavior Planning and Decision-making: This direction is centered on equipping intelligent agents with the ability to make real-time decisions based on their comprehension of understanding surroundings.

News! We are actively recruiting research interns with experience in robotics hardware development. Priority will be given to candidates with experience in robotics or drone development. Applicants should be proficient in the Robot Operating System (ROS), familiar with programming languages such as C/C++ or Python。

News! We have opening positions for Postdoctoral Fellow, Ph.D., M.phil., Research Assistant and Visiting Student, which are waiting for self-motivated talents. If you are interested in 3D Scene Understanding, Human-centric Visual Perception and Generation, Robot Manipulation, Multi-modal Learning, Neuro-Symbolic Computing, Reinforcement Learning and Embodied Cognition, please drop me an email via ruimao.zhang@ieee.org or zhangrm27@mail.sysu.edu.cn . More details about the recruitment and undergraduate research programme, please see here.

2025-10-24: One paper is accepted to T-PAMI2025. Congrats to Shunlin!

2025-09-18: One paper is accepted to NeurIPS2025. Congrats to Ziye, Yiran and Jiahua!

2025-08-02: Two papers are accepted to CoRL2025. Congrats to Jiahua, Yiran and Lai Wei!

2025-06-25: RoboFactory is accepted to ICCV2025 and rated as the Outstanding Paper of CVPR2025 MEIS Workshop.

2025-06-16: One paper is accepted to IROS2025. Congrats to Yiran!

2025-05-01: One paper is accepted to ICML2025. Congrats to Yiran!

2025-03-14: One paper is accepted to T-PAMI. Congrats to Jie Yang!

2025-02-27: Two papers are accepted to CVPR2025. Congrats to Shunlin!

2025-01-26: Three papers are accepted to ICRA2025. Congrats to Chaoqun and Yiran!

2025-01-22: One paper is accepted to ICLR2025 and elected as the Oral. Congrats to Ziye and Yiran!

2025-01-21: I am invited as a senior program committee (SPC) member of ECAI 2025.

2024-12-30: One paper is accepted to T-ITS. Congrats to Chaoqun and Yiran!

2024-10-29: I will join School of Electronics and Communication Engineering at SYSU-Shenzhen as an associate professor.

2024-09-27: One paper is accepted to NeurIPS2024. Congrats to Jie Yang!

2024-07-01: Two papers are accepted to ECCV2024. Congrats to Jie Yang!

2024-05-04: I am elected as the Senior Member of IEEE.

2024-05-02: One paper is accepted to ICML2024. Congrats to Shunlin!

2024-02-27: Five papers are accepted to CVPR2024. Congrats to all!

2024-01-31: I will serve as an Associate Editor of ACM Trans. on Multimedia Computing, Communications and Applications

2024-01-30: One paper is accepted to ICRA2024. Congrats to Chaoqun and Yiran!

2024-01-16: One paper is accepted to ICLR2024. Congrats to all!

2023-12-09: One paper is accepted to AAAI2024. Congrats to all!

2023-10-20: We present HumanTOMATO, a novel whole-body motion generation framework.

2023-10-15: We present UniPose to detect keypoints of any articulated for fine-grained vision understanding.

2023-09-22: Two papers are accepted to NeurIPS2023. Congrats to all!

2023-09-13: The first large-scale, real-world 3D pose estimation dataset, FreeMan, is released!

2023-07-26: One papers is accepted to ACM MM2023. Congrats to Siyue, Bingliang and Fengyu!

2023-07-14: Two papers are accepted to ICCV2023. Congrats to Jie, Chaoqun and Yiran!

2023-05-25: One paper is early accepted to MICCAI2023. Congrats to all!

2023-03-15: One paper is accepted to Pattern Recognition. Congrats to Qi Liu!

2023-03-02: Two papers are accepted to MIDL2023 and one is rated as the oral presentation. Congrats to Jie Yang and Ye Zhu!

2023-02-28: One paper is accepted to CVPR2023. Congrats to Jie Yang!

2023-02-27: One paper is accepted to T-NNLS. Congrats to Xiaozhe!

2023-01-21: One paper is accepted to ICLR2023. Congrats to Jie Yang!

2022-12-02: One paper is accepted to T-MM. Congrats to Ziyi!

2022-09-17: Two paper are accepted to NeurIPS2022. Congrats to all!

2022-07-05: Two paper are accepted to ECCV2022. Congrats to all!

2022-05-05: One paper is early accepted to MICCAI2022. Congrats to Weijie!

2022-05-01: We associated with MICCAI 2022 to host together MICCAI AMOS Segmentation Challenge 2022.

2021-11-07: One paper is accepted to T-IP. Congrats to Yuying!

2021-10-15: I was selected to receive a NeurIPS 2021 Outstanding Reviewer Award.

2021-07-23: Two papers are accepted to ICCV2021. Congrats to all!

2021-06-12: Two papers are accepted to MICCAI2021. Congrats to all!

2021-05-28: One paper is accepted bto T-MM. Congrats to Zhaoyi!

2021-05-05: A long version of polar representation for object detection is accepted by T-PAMI. Congrats to all！

2021-04-29: One paper is accepted to IJCAI2021. Congrats to Weibing and Yanxu！

2021-03-01: One paper is accepted to CVPR2021. Congrats to Yuying！

2021-02-18: I move to CUHKSZ as a Research Assistant Professor.

2020-12: One paper is accepted to AAAI2021.

2020-08: We won the First Prize in 2020 AIM Challenge on Learned Image Signal Processing Pipeline, Track 2.

2020-07: Two papers are accepted to ECCV2020 and MICCAI2020, respectively.

2020-02: Two papers are accepted to CVPR2020.

2019-08: A long version of SwitchNorm was presented in T-PAMI. Two papers are accepted to ICCV2019.

2019-05: I am organizing the second workshop in Fashion and Art.

BIOGRAPHY

“The weak and ignorance is not a barrier to survive, arrogance is"

---《The Three-Body Problem》 Cixin Liu

“No human nature, people will lose a lot; no bestiality, people will not survive“

---《The Three-Body Problem》 Cixin Liu

Education

The Chinese University of Hong Kong, Hong Kong, China. May. 2017 ~ Jul. 2019.
Postdoctoral Fellow in Multimedia Lab, worked with Prof. Xiaogang Wang ( co-founder of SenseTime ) and Prof. Ping Luo.

Sun Yat-sen University, Guangzhou, China. Dec. 2016.
Ph.D. in Computer Science and Technology, advised by Prof. Liang Lin ( IEEE/IAPR Fellow, Distinguish Young Scholar of NSFC ).

Sun Yat-sen University, Guangzhou, China. Jul. 2011.
B.E. in Software Engineering.

Experience

Sun Yat-sen University, Shenzhen, China. Oct. 2024 ~ Present,
Associate Professor, School of Electronics and Communication Engineering.

The Chinese University of Hong Kong, Shenzhen, China. Feb. 2021 ~ Sept. 2024,
Associate Researcher, School of Data Science.

SenseTime Research, Shenzhen, China. Jul. 2019 ~ Jan. 2021,
Senior Researcher, report to Prof. Jinwei Gu in SenseBrain, USA.

The Hong Kong Polytechnic University, HongKong, China. Aug. 2013 ~ Feb. 2014.
Visiting Ph.D. Student, advised by Prof. Lei Zhang and Prof. Wangmeng Zuo.

Sun Yat-sen University, Guangzhou, China. Dec. 2010 ~ Jul. 2011.
Research Assistant, advised by Prof. Liang Lin.

Awards and Honours

Shenzhen Overseas High-Level Talent Class C, 2022

Outstanding Reviewer Award of NeurIPS, 2021

AIM Challenge on Learned Image Signal Processing Pipeline, Track 2, First Prize, 2020

Best Paper Nomination Award of SenseTime Group Ltd., 2020

Google Youtube 8M Video Understanding Challenge, Golden Metal (1.5%), 2017

National Scholarship for Postgraduate, 2015

The National College IOT Innovation Competition, Third Prize, 2012

Excellent Student Scholarship of Sun Yat-sen University, 2008 ~ 2010

Academic Activity

Academic Service:
      Associate Editor, ACM Transactions on Multimedia Computing, Communications and Applications (2024.01~present)
      Area Chair, International Conference on Learning Representations (ICLR), 2026
      Senior Program Committee (SPC) member, European Conference on Artificial Intelligence (ECAI), 2025
      Session Chair, International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Part III, 2022
      Poster Chair, China Spatial Intelligence Conference (ChinaSI), China, 2025
      Executive Area Chair, Vision And Learning SEminar (VALSE), China (2021.07~present)

Reviewer for Conferences:
      IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) --- 2019, 2020, 2021, 2022, 2023, 2024
      IEEE International Conference on Computer Vision (ICCV) --- 2019, 2021, 2023, 2025
      European Conference on Computer Vision (ECCV) --- 2022, 2024
      Neural Information Processing Systems (NeurIPS) --- 2020, 2021, 2022, 2023, 2024
      International Conference on Learning Representations (ICLR) --- 2021, 2022, 2024, 2025
      International Conference on Machine Learning (ICML) --- 2022, 2023
      The Conference on Robot Learning (CoRL) --- 2025
      The IEEE/CVF Winter Conference on Applications of Computer Vision --- 2026
      International Conference on Artificial Intelligence and Statistics (AISTATS) --- 2025
      AAAI Conference on Artificial Intelligence (AAAI) --- 2021
      International Conference on Multimedia and Expo (ICME) --- 2014, 2016

Reviewer for Journals:
      IEEE Trans. on Pattern Analysis and Machine Intelligence (T-PAMI)
      International Journal of Computer Vision (IJCV)
      Artificial Intelligence
      ACM Computing Surveys
      IEEE Trans. on Neural Network and Learning System (T-NNLS)
      IEEE Trans. on Image Processing (T-IP)
      IEEE Trans. on Cybernetics (T-CYB)
      IEEE Trans. on Circuits and Systems for Video Technology (T-CSVT)
      IEEE Trans. on Multimedia (T-MM)
      IEEE Trans. on Dependable and Secure Computing (T-DSC)
      IEEE Trans. on Information Forensics and Security (T-IFS)
      ACM Transactions on Multimedia Computing, Communications and Applications (ACM TOMM)
      Expert Systems with Applications
      Pattern Recognition
      Neurocomputing
      Medical Image Analysis (MIA)
      Applied Soft Computing
      International Journal of Human-Computer Interaction

Workshop and Challenge Organizer:
      "Vision and Learning in Embodied Intelligence" workshop at VALSE, 2024, Chongqing, China
      "Autonomous Driving Based on Large-scale Models" workshop at VALSE, 2023, Wuxi, China
      "Abdominal Multi-Organ Segmentation Challenge" challenge at MICCAI, 2022, Singapore
      "Deep Learning for Medical Big Data Analysis" workshop at VALSE, 2022, Tianjin, China
      "Deep Model Architecture" workshop at VALSE, 2021, Hangzhou, China
      "Computer Vision for Fashion, Art, and Design" workshop at CVPR, 2020, Virtual
      "Computer Vision for Fashion, Art, and Design" workshop at ICCV 2019, Seoul, Korea

PUBLICATION

Preprint

(* indicates corresponding author)

Newly Accepted Articles

(* indicates corresponding author)

Ziye Wang, Li Kang, Yiran Qin, Jiahua Ma, Zhanglin Peng, Lei Bai, Ruimao Zhang*, "Reinventing Multi-Agent Collaboration through Gaussian-Image Synergy in Diffusion Policies", Proc. of Conference on Neural Information Processing Systems ( NeurIPS ), 2025 ( We present a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent embodied collaborative systems. ) 【PDF】

Jiahua Ma^, Yiran Qin^, Yixiong Li, Xuanqi Liao, Yulan Guo, Ruimao Zhang*, "CDP:Towards Robust Autoregressive Visuomotor Policy Learning via Causal Diffusion", Proc. of Conference on Robot Learning ( CoRL ), 2025 ( CDP enhances action prediction by conditioning on historical action sequences, thereby enabling more coherent and contextaware visuomotor policy learning. ) 【PDF】

Lai Wei^, Jiahua Ma^, Yibo Hu, Ruimao Zhang*, "Ensuring Force Safety in Vision-Guided Robotic Manipulation via Implicit Tactile Calibration", Proc. of Conference on Robot Learning ( CoRL ), 2025 ( We introduce a novel state diffusion termed SafeDiff to generate a prospective state sequence from the visual context observation while incorporating real-time tactile feedback to refine the sequence.) 【PDF】

Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin*, Xiaohong Liu, Xihui Liu, Ruimao Zhang*, Lei Bai*, "RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints", Proc. of International Conference on Computer Vision( ICCV ), 2025 ( We propose the concept of compositional constraints for embodied multi-agent systems, addressing the challenges arising from collaboration among embodied agents.) 【PDF】

Enshen Zhou, Yiran Qin, Zhenfei Yin, Zhelun Shi, Yuzhou Huang, Ruimao Zhang*, Lu Sheng*, Jing Shao, "Chain-of-Imagination for Reliable Instruction Following in Decision Making“, Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems ( IROS ), 2025 ( We employ a Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of executing instructions and translating imaginations into more precise visual prompts tailored to the current state. ) 【PDF】

Yiran Qin, Zhelun Shi, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao*, Lei Bai*, Wanli Ouyang, Ruimao Zhang*, "WorldSimBench: Towards Video Generation Models as World Simulators", Proc. of International Conference on Machine Learning( ICML ), 2025 ( We take the initial step in evaluating Predictive Generative Models up to the S3 stage by introducing both Explicit Perceptual Evaluation and Implicit Manipulative Evaluation) 【PDF】

Jie Yang, Ailing Zeng, Tianhe Ren, Shilong Liu, Feng Li, Ruimao Zhang*, Lei Zhang, "ED-Pose++: Enhanced Explicit Box Detection for Conventional and Interactive Multi-Object Keypoint Detection“, IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 2025 ( The extension version of our CVPR and ICCV papers. ED-Pose for Multi-Object Keypoint Detection.) 【PDF】

Shunlin Lu, Jingbo Wang*, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai*, Ruimao Zhang*, "ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model“, Proc. of IEEE International Conference on Computer Vision and Pattern Recognition( CVPR ), 2025 ( We introduce a scalable motion generation framework and observe the scaling behavior in autoregressive motion generation model for the first time. ) 【PDF】【Code】

Yiran Qin, Ao Sun, Hong Yuze, Benyou Wang, Ruimao Zhang*, "NavigateDiff: Visual Predictors are Zero-Shot Navigation Assistants“, Proc. of IEEE International Conference on Robotics and Automation( ICRA ), 2025 ( A large vision-language model with a diffusion network, named NavigateDiff, continuously predicts the agent’s potential observations to assist robots in generating robust actions.) 【PDF】

Chaoqun Wang, Jie Yang, Xiaobin Hong, Ruimao Zhang*, "Unlock the Power of Unlabeled Data in Language Driving Model“, Proc. of IEEE International Conference on Robotics and Automation( ICRA ), 2025 ( This work aims to overcome the barrier of large models' extreme dependence on costly large-scale high-quality annotated data in self-driving scenarios.) 【PDF】

Chaoqun Wang, Xiaobin Hong, Ruimao Zhang*, "Semantic-Supervised Spatial-Temporal Fusion for LiDAR-based 3D Object Detection“, Proc. of IEEE International Conference on Robotics and Automation( ICRA ), 2025 ( A novel fusion module to address spatial misalignment caused by object motion over time.) 【PDF】

Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang*, "High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation“, Proc. of International Conference on Learning Representations( ICLR ), 2025 (Oral) ( A newly proposed Spatio-Temporal Coherent Gaussian Representation (STC-GS) for dynamic scene prediction, offering a promising way for advancing 4D world model! ) 【PDF】【Project Page】

Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Yukai Shi, Zhen Li, Ruimao Zhang*, "Boosting 3D Object Detection via Self-distilling Introspective Data“, IEEE Transactions on Intelligent Transportation Systems ( T-ITS ), 2025 ( A novel self-distilling paradigm termed SID to boost the accuracy of 3D object detection in both LiDAR-based and LiDAR-Camera-based scenarios.) 【PDF】

Recent Selected Publications ( See Full List )

(* indicates corresponding author)

Principles and Practice of Embodied Intelligence (Chinese Version)
Liang Lin, Ruimao Zhang, Hefeng Wu
ISBN 9787121502668，Publishing House of Electronics Industry (PHEI)
( Systematically outline mainstream technical approaches of embodied intelligence and provide a comprehensive knowledge framework. Comprehensive guide to the key embodied AI technologies, including perception, navigation, manipulation, planning, and collaboration.)

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin, Li Kang, Xiufeng Song, Zhenfei Yin*, Xiaohong Liu, Xihui Liu, Ruimao Zhang*, Lei Bai*
Proc. of International Conference on Computer Vision( ICCV ), 2025 【PDF】
Outstanding Paper Award of CVPR2025 Multi-agent Embodied Intelligence Workshop
( We propose the concept of compositional constraints for embodied multi-agent systems, addressing the challenges arising from collaboration among embodied agents.)

WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin, Zhelun Shi, Jiwen Yu, Xijun Wang, Enshen Zhou, Lijun Li, Zhenfei Yin, Xihui Liu, Lu Sheng, Jing Shao*, Lei Bai*, Wanli Ouyang, Ruimao Zhang*
Proc. of International Conference on Machine Learning( ICML ), 2025 【PDF】
( We take the initial step in evaluating Predictive Generative Models up to the S3 stage by introducing both Explicit Perceptual Evaluation and Implicit Manipulative Evaluation.)

ScaMo: Exploring the Scaling Law in Autoregressive Motion Generation Model
Shunlin Lu, Jingbo Wang*, Zeyu Lu, Ling-Hao Chen, Wenxun Dai, Junting Dong, Zhiyang Dou, Bo Dai*, Ruimao Zhang*
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition( CVPR ), 2025 【PDF】【Code】
( Scaling behavior in autoregressive motion generation model for the first time. )

High-Dynamic Radar Sequence Prediction for Weather Nowcasting Using Spatiotemporal Coherent Gaussian Representation
Ziye Wang, Yiran Qin, Lin Zeng, Ruimao Zhang*
Proc. of International Conference on Learning Representations( ICLR ), 2025 (Oral) 【Project】
( A newly proposed Spatio-Temporal Coherent Gaussian Representation (STC-GS) for dynamic scene prediction, offering a promising way for advancing 4D world model! )

HumanTOMATO: Text-aligned Whole-body Motion Generation
Shunlin Lu, Ling-Hao Chen, Ailing Zeng, Jing Lin, Ruimao Zhang* , Lei Zhang, Heung-Yeung Shum*
Proc. of International Conference on Machine Learning ( ICML ), 2024【PDF】【Code】
( A novel text-aligned whole-body motion generation framework that can generate high-quality, diverse, and coherent whole body motions. )

Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang, Bingliang Li, Ailing Zeng, Lei Zhang, Ruimao Zhang*
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2024 【PDF】【Project】
( A novel prompt-based HOI detector designed to leverage both textual descriptions for open-set generalization and visual exemplars for handling high ambiguity in descriptions. )

MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin, Enshen Zhou, Qichang Liu, Zhenfei Yin, Lu Sheng*, Ruimao Zhang*, Yu Qiao, Jing Shao
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2024 【PDF】【Project】【Youtube】【Bilibili】
( MP5 is an open-ended multimodal embodied system that can conduct situation-aware plans and perform embodied action control via active perception scheme. )

FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
Jiong Wang, Fengyu Yang, Wenbo Gou, Bingliang Li, Danqi Yan, Ailing Zeng, Yijun Gao, Junle Wang, Ruimao Zhang*
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2024 【PDF】【Project】

Toward Accurate Camera-based 3D Object Detection via Cascade Depth Estimation and Calibration
Chaoqun Wang, Yiran Qin, Zijian Kang, Ningning Ma, Ruimao Zhang*
Proc. of IEEE International Conference on Robotics and Automation ( ICRA ), 2024
【PDF】

Enhancing Human-AI Collaboration Through Logic-Guided Reasoning
Chengzhi Cao, Yinghao Fu, Sheng Xu, Ruimao Zhang, Shuang Li,
Proc. of International Conference on Learning Representations ( ICLR ), 2024
【PDF】

Neural Interactive Keypoint Detection
Jie Yang, Ailing Zeng*, Feng Li, Shilong Liu, Ruimao Zhang*, Lei Zhang
Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023
【PDF】【Code】

SupFusion: Supervised LiDAR-Camera Fusion for 3D Object Detection
Yiran Qin, Chaoqun Wang, Zijian Kang, Ningning Ma, Zhen Li, Ruimao Zhang*
Proc. of IEEE International Conference on Computer Vision ( ICCV ), 2023
【PDF】【Code】

Semantic Human Parsing via Scalable Semantic Transfer over Multiple Label Domains
Jie Yang, Chaoqun Wang, Zhen Li, Junle Wang, Ruimao Zhang*
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2023 【PDF】【Code】

Inherent Consistent Learning for Accurate Semi-supervised Medical Image Segmentation
Ye Zhu, Jie Yang, Siqi Liu, Ruimao Zhang*
Proc. of Conference on Medical Imaging with Deep Learning( MIDL ), 2023 ( Oral )
【PDF】【Code】

Explicit Box Detection Unifies End-to-End Multi-Person Pose Estimation
Jie Yang, Ailing Zeng*, Shilong Liu, Feng Li, Ruimao Zhang*, Lei Zhang
Proc. of International Conference on Learning Representations( ICLR ), 2023
【PDF】【Code】

AMOS: A Large-Scale Abdominal Multi-Organ Benchmark for Versatile Medical Image Segmentation
Yuanfeng Ji, Haotian Bai, Jie Yang, Chongjian Ge, Ye Zhu, Ruimao Zhang*, et al.
Proc. of Conference on Neural Information Processing Systems ( NeurIPS ), 2022 ( Oral )
【PDF】【AMOS Challenge】

Weakly Supervised Object Localization via Transformer with Implicit Spatial Calibration
Haotian Bai, Ruimao Zhang*, Jiong Wang, Xiang Wan
Proc. of Europe Conference on Computer Vision( ECCV ), 2022
【PDF】【Code】【Youtube】

Switchable Normalization for Learning-to-Normalize Deep Representation
Ping Luo, Ruimao Zhang*, Jiamin Ren, Zhanglin Peng, Jingyu Li
IEEE Transactions on Pattern Analysis and Machine Intelligence ( T-PAMI ), 43(2):712-728, 2021
【PDF】【Code】

Exemplar Normalization for Learning Deep Representation
Ruimao Zhang, Zhanglin Peng, Lingyun Wu, Zhen Li, Ping Luo
Proc. of IEEE International Conference on Computer Vision and Pattern Recognition ( CVPR ), 2020 【PDF】【Supp】

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions
Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, Wangmeng Zuo
IEEE Transactions on Pattern Analysis and Machine Intelligence ( T-PAMI ), 41(3):596 - 610, 2019
【PDF】

SCAN: Self-and-Collaborative Attention Network for Video Person Re-identification
Ruimao Zhang, Jingyu Li, Hongbin Sun, Yuying Ge, Ping Luo, Xiaogang Wang, Liang Lin
IEEE Transactions on Image Processing ( T-IP ), 28(10):4870-4882, 2019
【PDF】【Code】

MEMBER

Ph.D. Students

Yiran Qin

Ph.D., since 2021, CUHK-SZ

Scene Understanding, Embodied AI, Large Visual Language Model

M.S.: not applicable

B.E.: Shandong University (Top 10%)

Shunlin Lu

Ph.D., since 2023, CUHK-SZ

Human Centric Understanding and Generation, Multi-modal Learning

M.S.: University of Southern California

B.E.: Wuhan University of Technology

Jiahua Ma

Ph.D., since 2025, Sun Yat-sen University

Embodied AI, Vision-Language-Action Model, Policy Learning

M.S.: Shanghai Jiao Tong University

B.E.: Xidian University (Top 5%)

Mphil/Master Student

Meng Cai

Master, since 2025, SYSU

Human Motion Generation, Humanoid Robot Simulation

B.E.: Sun Yat-sen University

Research Assistants

Bingqi Liu

Human Motion Generation, Multi-modal Learning

M.S.: Imperial College London

B.E.: Beihang University

Yuyu Sun

Motion Generation, Humanoid Robot Simulation

M.S.: not applicable

B.E.: South China Normal Univ. (Top 1%)

Yujie Chen

Embodied AI, Vision-Language-Action Model

M.S.: ETH Zürich

B.E.: Beijing Institute of Technology

Xiaocong Zeng

3D Object/3D Scene Generation, Digital Twin

M.S.: Sun Yat-sen University

B.E.: Sun Yat-sen University

Xin Wen

Embodied AI, Policy Learning, Reinforcement learning

M.S.: not applicable

B.E.: Sun Yat-sen University

Wenzhan Li

Embodied AI, World Modeling, Policy Learning

M.S.: Tsinghua University

B.E.: Xi'an Jiaotong University

Alumni (Ph.D./Mphil/Master Students)

Jie Yang ( B.E., Harbin Engineering University), Ph.D. Student, Sept. 2021 ~ Jun. 2025, Human Centric Visual Perception
Selected for Tencent 2025 Project Up (Qingyun Program)
Current Position: Senior Researcher, WeChat Vision, Tencent, Beijing, China.

Chaoqun Wang ( Mphil, Nanjing Univ. of Science & Technology ), Ph.D. Student, Sept. 2021 ~ Jun. 2025, 3D Scene Understanding
Current Position: Senior Researcher, Zhejiang Provincial Seaport Investment, Hangzhou, China.

Bingliang Li ( B.S., Lanzhou University ), Master Student, Sept. 2022 ~ Jun. 2024, Human-centric Visual Analysis
Current Position: Machine Learning Engineer, Xiaomi AI lab, Beijing, China.

Fengyu Yang ( B.S., Hunan University ), Master Student, Sept. 2022 ~ Jun. 2024, Human-centric Visual Analysis
Current Position: Computer Vision Engineer, Tecent, Shenzhen, China.

Jiong Wang ( B.S., The Chinese University of Hong Kong ), Mphil Student, Sept. 2021 ~ Jun. 2023, Human-centric Visual Analysis
Current Position: Ph.D. student, Fudan University, Shanghai, China.

Alumni (Research Assistant)

Ziye Wang ( Mphil, Harbin Institute of Technology, Shenzhen ), Research Assistant, Mar. 2024 - Jan. 2025, Embodied AI
Current Position: Ph.D. student, MMLab, The University of Hong Kong (HKU), Hong Kong, China.

Xuechen Xiong ( M.S., Huazhong Univ. of Science & Technology ), Research Assistant, Mar. 2024 - Jan. 2025, Motion Generation
Current Position: Ph.D. student, Harbin Institute of Technology (HIT-SZ), Shenzhen, China.

Lai Wei ( B.S., The Chinese University of Hong Kong, Shenzhen ), Research Assistant, Dec. 2023 ~ Jun. 2024, Embodied AI
Current Position: Master student, University of California San Diego (UCSD), CA, U.S.

Jiayu Chang ( B.S., Central South University ), Research Assistant, Jun. 2023 ~ Dec. 2023, Medical Image Analysis
Current Position: Master student, Stanford University, CA, U.S.

Hanqi Jiang ( B.S., Beijing Jiaotong University ), Research Assistant, Jun. 2023 ~ Dec. 2023, Medical Image Analysis
Current Position: Ph.D. student, University of Georgia, U.S.

Xixuan Hao ( M.S., The University of Hong Kong ), Research Assistant, Dec. 2022 ~ Jul. 2023, Medical Image Analysis
Current Position: Ph.D. student, The Hong Kong University of Science and Technology, Guangzhou (HKUST-GZ), China.

Siyue Yao ( M.S., King’s College London ), Research Assistant, Jul. 2022 ~ Apr. 2023, Human Centric Visual Generation
Current Position: Ph.D. student, Xi'an Jiaotong-Liverpool University (XJTLU), China.

Ye Zhu ( B.S., South China Agricultural University ), Research Assistant, Oct. 2021 ~ Jun. 2023, Medical Image Analysis, Transformer
Current Position: Ph.D. student, Hong Kong Baptist University (HKBU), Hong Kong, China.

Ziyi Tang ( M.S., University of Southampton ), Research Assistant, Jul. 2021 ~ Jul. 2022, Cross-modal Learning, Transformer
Current Position: Ph.D. student, Sun Yat-sen University (SYSU), China.

Haotian Bai ( B.E., Shanghai University ), Research Assistant, Jul. 2021 ~ Apr. 2022, Transformer Architecture
Current Position: Ph.D. student, The Hong Kong University of Science and Technology, Guangzhou (HKUST-GZ), China.

Hao Zhang ( M.S., University of Southern California ), Research Assistant, Jul. 2021 ~ Feb. 2022, Large-scale Pretraining
Current Position: Ph.D. student, University of Illinois Urbana-Champaign (UIUC), U.S.

Huijie Wang ( M.S., Technische Universität München ), Visiting Student, Jul. 2020 ~ Feb. 2021, Medical Image Analysis
Current Position: Researcher, Shanghai Artificial Intelligence Laboratory, China.

TEACHING

ECE7719: Fundamentals of Applied Mathematics. Fall 2025
Instructor, Sun Yat-sen University, Shenzhen.

ECE371: Neural Networks and Deep Learning, English. Spring 2025
Instructor, Sun Yat-sen University, Shenzhen.

DDA4220: Deep Learning and Applications. Spring 2023
Instructor, The Chinese University of Hong Kong, Shenzhen.

MDS5102: Python Programming. Fall 2021
Instructor, The Chinese University of Hong Kong, Shenzhen.

CSC1001: Introduction to Programming: Python. Fall 2021
Instructor, The Chinese University of Hong Kong, Shenzhen.

CSC1001: Introduction to Programming: Python. Spring 2021
Instructor, The Chinese University of Hong Kong, Shenzhen.

Computer Vision. Fall 2013
Taught by Prof. Liang Lin , @Sun Yat-sen University.
Teaching Assistant

Lineare Algebra. Fall 2012
      Taught by Prof. Weishi Zheng , @Sun Yat-sen University.
      2+2 International Undergraduate Program, all in English.
      Teaching Assistant, Sun Yat-sen University.

Data Structure. Fall 2011
Taught by Prof. Liang Lin , @Sun Yat-sen University.
Teaching Assistant

Modern Computer Vision. Summer 2011
      Taught by Prof. Alan L. Yuille from UCLA, @Sun Yat-sen University.
      Summer Intensive Course, all in English.
      Teaching Assistant

CONTRACT ME

Address: Shenzhen Campus of Sun Yat-sen University, Shenzhen, China

E-mail: ruimao.zhang@ieee.org or zhangrm27@mail.sysu.edu.cn

Phone: (0755)