Jilan Xu
jilanxu18 at fudan dot edu dot cn
I am a final year PhD student at Fudan University, advised by Professor Yuejie Zhang . I also work closely with Professor Weidi Xie . My research focuses on multimodal machine learning, video understanding, and medical image analysis. I hope that someday medical AI agents would heal the world, make it a better place, for the entire human race.
Google Scholar  / 
Twitter  / 
GitHub  / 
Zhihu  
News
I'm actively looking for postdoc/job positions in 2025, please feel free to email me!
[01/2025] Three Papers (XGen , EgoVideo , CGBench ) accepted to ICLR 2025!!!
[01/2025] Honored to be invited to give a talk on joint egocentric-exocentric video understanding at TechBeat
[05/2024] Our CVPR papers Egoinstructor and EgoExoLearn are also accepted to 1st LPVL Workshop @ CVPR 2024
[04/2024] We rank 1st at 4th-COV19D Competition Track 2 and 4th at Track1 @ CVPR 2024
Computer Vision
Medical Image Analysis
XGen: Egocentric Video Prediction by Watching Exocentric Videos
Jilan Xu , Yifei Huang, Baoqi Pei, Junlin Hou, Qingqiu Li, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
ICLR 2025  
A cross-view video prediction model that predicts future egocentric video frames by leveraging paired exocentric video and text instructions.
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei, Yifei Huang, Jilan Xu , Guo Chen, Yuping He, Lijin Yang, Yali Wang, Weidi Xie, Yu Qiao, Fei Wu, Limin Wang
ICLR 2025  
An egocentric video-language model that learns fine-grained egocentric video representations by modeling hand-object dynamics.
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen*, Yicheng Liu*, Yifei Huang*, Yuping He, Baoqi Pei, Jilan Xu, Yali Wang, Tong Lu, Limin Wang
ICLR 2025  
arXiv / project page / code
A clue-grounded question answering benchmark for long video understanding.
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang*, Guo Chen*, Jilan Xu* , Mingfang Zhang*, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao,
CVPR 2024  
arXiv / project page / code
A cross-view benchmark dataset that emulates the human demonstration following process, containing recorded egocentric videos guided by exocentric-view demonstration videos.
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu , Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
CVPR 2024  
arXiv / project page / code
Given an egocentric video, Egoinstructor automatically retrieves relevant exocentric instructional videos for assisting egocentric video captioning.
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie
CVPR 2023  
arXiv / project page / code
Training open-vocabulary semantic segmentation models with image-text pairs only, which enables zero-transfer to various segmentation datasets.
CREAM: Weakly supervised object localization via class re-activation mapping
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Rui-Wei Zhao, Tao Zhang, Xuequan Lu, Shang Gao
CVPR 2022  
arXiv
A weakly-supervised object localization model that generates better CAMs via soft-clustering algorithms.
Does video-text pretraining help open vocabulary online action detection
Qingsong Zhao, Yi Wang, Jilan Xu , Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao
NeurIPS 2024  
arXiv
A zero-shot online action detector that leverages vision-language models and enables open-world temporal understanding.
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu , Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, Yu Qiao
Tech report 2022  
arXiv / code
A fundation model for video / video-text understanding, achieving SOTA over 30 benchmark datasets.
Concept-Attention Whitening for Interpretable Skin Lesion Diagnosis
Junlin Hou, Jilan Xu , Hao Chen
MICCAI 2024  
arXiv
An XAI framework that aligns the axes of the latent space with concepts of interest for interpretable skin lesion diagnosis.
Anatomical structure-guided medical vision-language pre-training
Qingqiu Li, Xiaohan Yan, Jilan Xu , Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang
MICCAI 2024  
arXiv
An Anatomical Structure-Guided visual-text pre-training framework that leverages the anatomical knowledge.
CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors
Junlin Hou, Jilan Xu , Nan Zhang, Yi Wang, Yuejie Zhang, Xiaobo Zhang, Rui Feng
ECCV 2022 AIMIA Workshop  
arXiv / code
A Transformer-based model with contrastive representation enhancement. Winner of the 2nd COVID-19 Detection in ECCV 2022.
TCCNet: Temporally Consistent Context-Free Network for Semi-supervised Video Polyp Segmentation
Xiaotong Li, Jilan Xu , Yuejie Zhang, Rui Feng, Rui-Wei Zhao, Tao Zhang, Xuequan Lu, Shang Gao
IJCAI 2022, Oral  
paper
Co-training a model for semi-supervised video polyp segmentation, achieving comparable results using only 15% labeled data.
CMC-COV19D: Contrastive Mixup Classification for COVID-19 Diagnosis
Junlin Hou*, Jilan Xu* , Rui Feng, Yuejie Zhang, Fei Shan, Weiya Shi
ICCV 2021, AIMIA Workshop.  
paper / code
A ResNest-50 model combined with contrastive mixup technique for 3D COVID-19 CT image classification. Winner of the 1st COVID-19 detection challenge.
Data-Efficient Histopathology Image Analysis with Deformation Representation Learning
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Chunyang Ruan, Tao Zhang, Weiguo Fan
BIBM 2020, Oral  
paper
Introducing a self-supervised deformation representation learning technique for histopathology image analysis.
Awards & Honors
Winner of the 4th-COV19D Competition Track 2 (COVID19 Domain Adaptation Challenge) and rank 4th at Track1 (COVID-19 Detection Challenge) @ CVPR 2024
Winner of the MMAC Challenge Track1 (Classification of Myopic Maculopathy) and Track2 (Segmentation of Myopic Maculopathy Plus Lesions) @ MICCAI 2023
Winner of the 1st & 2nd COVID-19 Detection Challenge @ ICCV 2021 & ECCV 2022
Winner of the 1st COVID-19 Severity Detection Challenge @ ECCV 2022
VenusTech Enterprise Scholarship
Working Experience
Shanghai AI Laboratory
Research Intern
Supervised by Dr. Yifei Huang, Yi Wang and Prof. Yu Qiao
Bell AI Lab, Shanghai
Research Intern
Supervised by Dr. Chenhui Ye
Google Winter AI Camp
🏆 Best Presentation Award Winner
Morgan Stanley Technology
Software Engineering Intern
Supervised by Ray Zhou
Academic Services
Conference Reviewer : ICLR25, NeurIPS24, ECCV24, MICCAI24, CVPR24, CVPR23, ICCV23, NeurIPS22
Journal Reviewer : Nature Communications, TPAMI, IJCV, TMM, NeuroComputing
TA : Data Structure, The Theory of Computation
This guy is good at website design.