Welcome!

I am an incoming CS Ph.D. Student at University of California San Diego (UCSD), advised by Prof. Lianhui Qin. Prior to this, I completed my undergraduate studies in the School of Software Engineering at Tongji University, where I ranked 1st out of 227 students with a major GPA of 4.99/5.0. During my undergraduate studies, I was also a Visiting Researcher at the Berkeley NLP Group within the Berkeley Artificial Intelligence Research (BAIR) Lab, working closely with Prof. Alane Suhr and Zineng Tang.

My research focuses on natural language processing, machine learning, and computer vision. I am particularly interested in enabling vision-language models (VLMs) to perceive, reason, and interact with the world in a more human-like manner, grounded in multimodal context. While recent VLMs have shown impressive progress, there still remains a significant gap between VLMs and humans in handling vision-centric tasks (e.g., 3D world understanding, counting, and visual IQ tests). I am curious about whether we can develop more effective approaches for enabling human-level visual reasoning abilities in these models.

You can find my CV here. I am always open to any form of collaboration. If you have any ideas for potential collaboration, or just feel like having a casual chat, please feel free to reach out!

🔥 News

2025.04: Thrilled to join UCSD as a CS Ph.D. student. Looking forward to starting this new journey!🌴🌊☀️
2025.02: Our work on evaluating VLMs on photorealistic color illusion scenes has been accepted to CVPR 2025.
2024.09: Our work on multi-perspective communication has been accepted by EMNLP main 2024.
2024.09: Our work on multimodal instruction-tuning for biomedicine has been accepted to NeurIPS D&B 2024! Many thanks to Dr. Hejie Cui and Prof. Carl Yang for their guidance and support!
2024.06: 🎉🎉Our paper “Among Agents” is accepted at ACL Wordplay Workshop 2024. See you in Bangkok!
2024.01: Thrilled to join the Berkeley NLP Group as an intern! Go bears!🐻🔥
2023.07: Accepted into the University of Hong Kong’s CS summer research internship. A wonderful summer with Prof. Chuan Wu and Dr. Junwei Su!
2022.02: Join the MIT Media Lab’s CSL@Shanghai.

📝 Publications

Images are Worth Variable Length of Representations

Lingjun Mao, Rodolfo Corona, Xin Liang, Wenhao Yan, Zineng Tang

Project | Arxiv 2025

We propose DOVE, a dynamic vision encoder that produces a variable number of tokens to reconstruct each image.
We extend DOVE with query-conditioned tokenization, which enables more efficient and targeted semantic extraction.

Evaluating Model Perception of Color Illusions in Photorealistic Scenes

Lingjun Mao, Zineng Tang, Alane Suhr

Project | CVPR 2025

We propose an automated framework for generating realistic illusion images and creat a large, realistic dataset (RCID) of color illusion images.
We investigate the underlying mechanisms of color illusions.

Grounding Language in Multi-Perspective Referential Communication

Zineng Tang, Lingjun Mao, Alane Suhr

Project | EMNLP main 2024

We introduce a task and dataset for referring expression generation and comprehension in multi-agent embodied environments.

AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game

Yizhou Chi, Lingjun Mao, Zineng Tang

Project | ACL Wordplay Workshop 2024

This paper focuses on creating proxies of human behavior in simulated environments, with “Among Us” utilized as a tool for studying simulated human behavior.

Biomedical Visual Instruction Tuning with Clinician Preference Alignment

Hejie Cui*, Lingjun Mao*, Xin Liang, Jieyu Zhang, Hui Ren, Quanzheng Li, Xiang Li, Carl Yang

Project | NeurIPS 2024

we propose a data-centric framework (Biomed-VITAl) that incorporates clinician preferences into both stages of generating and selecting instruction data for tuning biomedical multimodal foundation models.

BG-HGNN: Toward Scalable and Efficient Heterogeneous Graph Neural Network

Junwei Su*, Lingjun Mao*, Chuan Wu

Project | ArXiv 2024

We first highlights and demonstrates that the standard approach employed by existing HGNNs inevitably leads to parameter explosion and relation collapse.

AI Agent as Urban Planner: Steering Stakeholder Dynamics in Urban Planning via Consensus-based Multi-Agent Reinforcement Learning

Kejiang Qian, Lingjun Mao, Xin Liang, Yimin Ding, Jin Gao, Xinran Wei, Ziyi Guo, Jiajie Li

Project | ArXiv 2024

we introduce a Consensus-based Multi-Agent Reinforcement Learning framework for real-world land use readjustment.

📖 Educations

2024.01 - 2024.10, Visiting Student (Berkeley Global Access Exchange Program) in University of California, Berkeley, USA
- Supervised by Prof. Alane Suhr
2020-2025(expected), Software Engineering, Tongji University, Shanghai, China
- Supervised by Prof. Zhen Gao

💻 Internships

Feb 2024 - Present: Berkeley NLP Group, Berkeley Artificial Intelligence Research (BAIR) Lab
Nov 2023 - Jun 2024: Department of Computer Science, Emory University
Apr 2023 - May 2024: Department of Computer Science, University of Hong Kong
Apr 2022 - Nov 2023: City Science Lab@Shanghai (MIT Media Lab)
Sept 2021 - Nov 2023: Tongji ADE Lab
May 2021 - Apr 2023: Tongji NaMI Lab

2025@Lingjun Mao