Jilan Xu
jilanxu18 at fudan dot edu dot cn
I am a final year PhD student at Fudan University, advised by Professor Yuejie Zhang . I also work closely with Professor Weidi Xie . My research focuses on multimodal machine learning, video understanding, and medical image analysis. I hope that someday medical AI agents would heal the world, make it a better place, for the entire human race.
Google Scholar  / 
Twitter  / 
GitHub  / 
Zhihu  
News
I'm actively looking for postdoc/job positions in 2025, please feel free to email me !
[05/2024] Our CVPR papers Egoinstructor and EgoExoLearn are also accepted to 1st LPVL Workshop @ CVPR 2024
[04/2024] We rank 1st at 4th-COV19D Competition Track 2 and 4th at Track1 @ CVPR 2024
Computer Vision
Medical Image Analysis
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang*, Guo Chen*, Jilan Xu* , Mingfang Zhang*, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao,
CVPR 2024  
arXiv / project page / code
A cross-view benchmark dataset that emulates the human demonstration following process, containing recorded egocentric videos guided by exocentric-view demonstration videos.
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu , Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie
CVPR 2024  
arXiv / project page / code
Given an egocentric video, Egoinstructor automatically retrieves relevant exocentric instructional videos for assisting egocentric video captioning.
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Yi Wang, Yu Qiao, Weidi Xie
CVPR 2023  
arXiv / project page / code
Training open-vocabulary semantic segmentation models with image-text pairs only, which enables zero-transfer to various segmentation datasets.
CREAM: Weakly supervised object localization via class re-activation mapping
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Rui-Wei Zhao, Tao Zhang, Xuequan Lu, Shang Gao
CVPR 2022  
arXiv
A weakly-supervised object localization model that generates better CAMs via soft-clustering algorithms.
Does video-text pretraining help open vocabulary online action detection
Qingsong Zhao, Yi Wang, Jilan Xu , Yinan He, Zifan Song, Limin Wang, Yu Qiao, Cairong Zhao
NeurIPS 2024  
arXiv
A zero-shot online action detector that leverages vision-language models and enables open-world temporal understanding.
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang, Kunchang Li, Yizhuo Li, Yinan He, Bingkun Huang, Zhiyu Zhao, Hongjie Zhang, Jilan Xu , Yi Liu, Zun Wang, Sen Xing, Guo Chen, Junting Pan, Jiashuo Yu, Yali Wang, Limin Wang, Yu Qiao
Tech report 2022  
arXiv / code
A fundation model for video / video-text understanding, achieving SOTA over 30 benchmark datasets.
Concept-Attention Whitening for Interpretable Skin Lesion Diagnosis
Junlin Hou, Jilan Xu , Hao Chen
MICCAI 2024  
arXiv
An XAI framework that aligns the axes of the latent space with concepts of interest for interpretable skin lesion diagnosis.
Anatomical structure-guided medical vision-language pre-training
Qingqiu Li, Xiaohan Yan, Jilan Xu , Runtian Yuan, Yuejie Zhang, Rui Feng, Quanli Shen, Xiaobo Zhang, Shujun Wang
MICCAI 2024  
arXiv
An Anatomical Structure-Guided visual-text pre-training framework that leverages the anatomical knowledge.
CMC_v2: Towards More Accurate COVID-19 Detection with Discriminative Video Priors
Junlin Hou, Jilan Xu , Nan Zhang, Yi Wang, Yuejie Zhang, Xiaobo Zhang, Rui Feng
ECCV 2022 AIMIA Workshop  
arXiv / code
A Transformer-based model with contrastive representation enhancement. Winner of the 2nd COVID-19 Detection in ECCV 2022.
TCCNet: Temporally Consistent Context-Free Network for Semi-supervised Video Polyp Segmentation
Xiaotong Li, Jilan Xu , Yuejie Zhang, Rui Feng, Rui-Wei Zhao, Tao Zhang, Xuequan Lu, Shang Gao
IJCAI 2022, Oral  
paper
Co-training a model for semi-supervised video polyp segmentation, achieving comparable results using only 15% labeled data.
CMC-COV19D: Contrastive Mixup Classification for COVID-19 Diagnosis
Junlin Hou*, Jilan Xu* , Rui Feng, Yuejie Zhang, Fei Shan, Weiya Shi
ICCV 2021, AIMIA Workshop.  
paper / code
A ResNest-50 model combined with contrastive mixup technique for 3D COVID-19 CT image classification. Winner of the 1st COVID-19 detection challenge.
Data-Efficient Histopathology Image Analysis with Deformation Representation Learning
Jilan Xu , Junlin Hou, Yuejie Zhang, Rui Feng, Chunyang Ruan, Tao Zhang, Weiguo Fan
BIBM 2020, Oral  
paper
Introducing a self-supervised deformation representation learning technique for histopathology image analysis.
Awards & Honors
Winner of the 4th-COV19D Competition Track 2 (COVID19 Domain Adaptation Challenge) and rank 4th at Track1 (COVID-19 Detection Challenge) @ CVPR 2024
Winner of the MMAC Challenge Track1 (Classification of Myopic Maculopathy) and Track2 (Segmentation of Myopic Maculopathy Plus Lesions) @ MICCAI 2023
Winner of the 1st & 2nd COVID-19 Detection Challenge @ ICCV 2021 & ECCV 2022
Winner of the 1st COVID-19 Severity Detection Challenge @ ECCV 2022
VenusTech Enterprise Scholarship
Working Experience
Research intern @ Shanghai AI Laboratory, supervised by Dr.Yifei Huang, Yi Wang and Prof. Yu Qiao.
Research intern @ Bell AI Lab, Shanghai, supervised by Dr. Chenhui Ye.
Google Winter AI Camp. Our team won the best presentation award !!!
SWE intern @ Morgan Stanley Technology, supervised by Ray Zhou.
Academic Services
Conference Reviewer : ICLR25, NeurIPS24, ECCV24, MICCAI24, CVPR24, CVPR23, ICCV23, NeurIPS22
Journal Reviewer : Nature Communications, TPAMI, IJCV, TMM, NeuroComputing
TA : Data Structure, The Theory of Computation
This guy is good at website design.