Yuchang Su

I'm a senior undergraduate student at Tsinghua University, where I major in Computer Science and Technology.

In 2024 summer, I've worked as undergraduate visiting research intern in MARVL lab at Stanford University, advised by Prof. Serena Yeung-Levy.

Email  /  CV  /  Scholar  /  Twitter  /  Github

profile photo

Research

I'm interested in multimodal learning, vision-language model, generative model, and their biomedical application.

CellFlow: Simulating Cellular Morphology Changes via Flow Matching
Yuhui Zhang*, Yuchang Su*, Chenyu Wang, Tianhong Li, Zoe Wefers, Jeffrey J. Nirschl, James Burgess, Daisy Ding, Alejandro Lozano, Emm Lundberg, Serena Yeung-Levy
ICML 2025 Under Review
arxiv
*co-first authorship

We introduce CellFlow, an image-generative model that simulates cellular morphology changes induced by chemical and genetic perturbations using flow matching.
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
Yuhui Zhang*, Yuchang Su*, Yiming Liu, Xiaohan Wang, James Burgess, Elaine Sui, Chenyu Wang, Josiah Aklilu, Alejandro Lozano, Anjiang Wei, Ludwig Schmidt, Serena Yeung-Levy
CVPR 2025
project page / arxiv / code / data
*co-first authorship

We introduce AutoConverter, an agentic framework that automatically converts these open-ended questions into multiple-choice format, enabling objective evaluation while reducing the costly question creation process.
Converting Open-ended Questions to Multiple-choice Questions Simplifies Biomedical Vision-Language Model Evaluation
Yuchang Su, Yuhui Zhang, Yiming Liu, Ludwig Schmidt, Serena Yeung-Levy
ML4H 2024
project page / pdf / code

We propose converting open-ended medical VQA datasets into multiple-choice format to address evaluation problems in medical VLMs.
Multimodal Generalized Category Discovery
Yuchang Su, Renping Zhou, Siyu Huang, Xingjian Li, Tianyang Wang, Ziyue Wang, Min Xu
Preprint
arxiv

We extend GCD to a multimodal setting, where inputs from different modalities provide richer and complementary information. Through theoretical analysis and empirical validation, we identify that the key challenge in multimodal GCD lies in effectively aligning heterogeneous information across modalities. To address this, we propose MM-GCD, a novel framework that aligns both the feature and output spaces of different modalities using contrastive learning and distillation techniques.
Why are Visually-Grounded Language Models Bad at Image Classification?
Yuhui Zhang, Alyssa Unell, Xiaohan Wang, Dhruba Ghosh, Yuchang Su, Ludwig Schmidt, Serena Yeung-Levy
NeurIPS 2024
project page / arxiv / code

We investigate why visually-grounded language models are bad at classification and find that the primary cause is data-related.
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess*, Jeffrey J Nirschl*, Laura Bravo-Sánchez*, Alejandro Lozano, Sanket Rajan Gupte, Jesus G. Galaz-Montoya, Yuhui Zhang, Yuchang Su, Disha Bhowmik, Zachary Coman, Sarina M. Hasan, Alexandra Johannesson, William D. Leineweber, Malvika G Nair, Ridhi Yarlagadda, Connor Zuraski, Wah Chiu, Sarah Cohen, Jan N. Hansen, Manuel D Leonetti, Chad Liu, Emma Lundberg,
CVPR 2025
pdf / benchmark
*co-first authorship

MicroVQA is an expert-curated benchmark for research-level reasoning in biological microscopy. We also propose a method for making multiple-choice VQA more challenging.