Publications
2025
- RAGEN (RL-Agent): Training Agents by Reinforcing Reasoning
Zihan Wang*, Kangrui Wang*, Qineng Wang*, Pingyue Zhang*, Linjie Li*, Zhengyuan Yang, Kefan Yu, Minh Nhat Nguyen, Monica Lam, Yiping Lu, Kyunghyun Cho, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li
MMLS 2025 (Midwest Machine Learning Symposium)
Best Poster Award 2.3k+ Github Stars, Featured by MIT Tech Review, Lambda Partner Spotlight, VentureBeat, Medium, AI News, MarkTechPost, Business Leaders Review, etc. - VAGEN: Reinfocing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang*, Pingyue Zhang*, Zihan Wang*, Yaning Gao*, Linjie Li*, Qineng Wang, Chi Wan, Hanyang Chen, Yiping Lu, Zhengyuan Yang, Lijuan Wang, Ranjay Krishna, Jiajun Wu, Li Fei-Fei, Yejin Choi, Manling Li
NeurIPS 2025
- Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran*, Michael Poli*, Daniel Y. Fu, Dongjun Kim, Lea M. Hadzic, Manling Li, Agrim Gupta, Stefano Massaroli, Azalia Mirhoseini, Juan Carlos Niebles, Stefano Ermon, Li Fei-Fei
NeurIPS 2025
Oral (Top 0.36%) - Spatial Mental Modeling from Limited Views
Qineng Wang*, Baiqiao Yin*, Pingyue Zhang, Jianshu Zhang, Kangrui Wang, Zihan Wang, Jieyu Zhang, Keshigeyan Chandrasegaran, Han Liu, Ranjay Krishna, Saining Xie, Jiajun Wu+, Li Fei-Fei+, Manling Li+
ICCV 2025
Spotlight at ICCV 2025 Workshop on Structural Priors for Vision - ROSETTA: Constructing Code-Based Reward from Unconstrained Language Preference
Sanjana Srivastava*, Kangrui Wang*, Yung-Chieh Chan*, Tianyuan Dai, Manling Li, Ruohan Zhang, Mengdi Xu, Jiajun Wu, Li Fei-Fei
RSS 2025 (Continual Robot Learning from Humans)
Best Paper Award - EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Rui Yang, Hanyang Chen, Junyu Zhang, Mark Zhao, Cheng Qian, Kangrui Wang, Qineng Wang, Teja Venkat Koripella, Marziyeh Movahedi, Manling Li, Heng Ji, Huan Zhang, Tong Zhang
ICML 2025
Oral (Top 1%) - Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen, Tongyao Zhu, Ruochen Zhou, Jinghan Zhang, Siyang Gao, Juan Carlos Niebles, Mor Geva, Junxian He, Jiajun Wu, Manling Li
ICML 2025
- Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging
Shiqi Chen, Jinghan Zhang, Tongyao Zhu, Wei Liu, Siyang Gao, Miao Xiong, Manling Li, Junxian He
ICML 2025
- SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering
Xuehang Guo, Xingyao Wang, Yangyi Chen, Sha Li, Chi Han, Manling Li, Heng Ji
ICML 2025
- T*: Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye*, Zihan Wang*, Haosen Sun, Keshigeyan Chandrasegaran, Zane Durante, Cristobal Eyzaguirre, Yonatan Bisk, Juan Carlos Niebles, Ehsan Adeli, Li Fei-Fei, Jiajun Wu, Manling Li
CVPR 2025
Oral at ICCV 2025 Workshop on Long Multi-Scene Video Foundations - LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim, Goutam Bhat, Federico Tombari, Manling Li, Nick Haber, Jiajun Wu
CVPR 2025
- Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan, Haozheng Luo, Manling Li, Han Liu
ICLR 2025
- Visually Descriptive Language Modeling for Vector Graphics Reasoning
Zhenhailong Wang, Joy Hsu, Xingyao Wang, Kuan-Hao Huang, Manling Li, Jiajun Wu, Heng Ji
TMLR
- The Law of Knowledge Overshadowing: Towards Understanding, Predicting and Preventing LLM Hallucination
Yuji Zhang, Sha Li, Cheng Qian, Jiateng Liu, Pengfei Yu, Chi Han, Yi Fung, Kathleen McKeown, ChengXiang Zhai, Manling Li, Heng Ji
ACL 2025 Findings
- Chain-of-Experts: Unlocking the Communication Power of MoEs
Zihan Wang, Rui Pan, Jiarui Yao, Róbert Csordás, Linjie Li, Lu Yin, Jiajun Wu, Tong Zhang, Manling Li, Shiwei Liu
arXiv Preprint 2025
2024
- Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Manling Li*, Shiyu Zhao*, Qineng Wang*, Kangrui Wang*, Yu Zhou*, Sanjana Srivastava, Cem Gokmen, Tony Lee, Li Erran Li, Ruohan Zhang, Weiyu Liu, Percy Liang, Li Fei-Fei, Jiayuan Mao, Jiajun Wu
NeurIPS 2024 D&B Track
Oral (Top 0.6%) Best Paper Award at SoCal NLP 2024, Top 0.4% - HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran, Agrim Gupta, Taran Kota, Lea M. Hadzic, Jimming He, Cristobal Eyzaguirre, Zane Durante, Manling Li, Jiajun Wu, Li Fei-Fei
NeurIPS 2024 D&B Track
- IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos
Yunong Liu, Weiyu Liu, Shubh Khanna, Cristobal Eyzaguirre, Manling Li, Juan Carlos Niebles, Vineeth Ravi, Saumitra Mishra, Jiajun Wu
NeurIPS 2024 D&B Track
- LM-Steer: Word Embeddings Are Steers for Language Models
Chi Han, Jialiang Xu, Manling Li, Yi Fung, Chenkai Sun, Nan Jiang, Tarek Abdelzaher, Heng Ji
ACL 2024
(Outstanding Paper Award at ACL 2024) - Why Does New Knowledge Create Messy Ripple Effects in LLMs?
Jiaxin Qin, Zixuan Zhang, Chi Han, Pengfei Yu, Manling Li, Heng Ji
EMNLP 2024
- Deep Concept Injection for Zero-shot Multimodal Reasoning
Xudong Lin, Manling Li, Richard Zemel, Heng Ji, Shih-Fu Chang
EMNLP 2024
- SmartBook: AI-Assisted Situation Report Generation
Revanth Gangi Reddy, Yi Fung, Qi Zeng, Manling Li, Zihan Wang, Paul Sullivan, Heng Ji
arXiv
- Controlling Object Existence Hallucinations in Large Vision Language Models
Bohan Zhai, Shijia Yang, Chenfeng Xu, Sheng Shen, Kurt Keutzer, Chunyuan Li, Manling Li
arXiv
- Event-centric Multimodal Knowledge Acquisition
Manling Li
Thesis Committee: Heng Ji, Jiawei Han, Chengxiang Zhai, Shih-Fu Chang, Kyunghyun Cho
Thesis (ACL Inaugral Best Desseratation Award Honorable Mention) - Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang†*,Manling Li*, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, Ziyi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
NeurIPS'22 (equal contribution)
- CLIP-Event:Connecting Vision and Text with Event Structures
Manling Li, Ruochen Xu, Shuohang Wang, Xudong Lin, Chenguang Zhu, Xuedong Huang, Heng Ji, Shih-Fu Chang
CVPR'22
(Oral, Top 4.1%)
2021
- COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation
Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, David Liem, Ahmed Elsayed, Martha Palmer, Jasmine Rah, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
NAACL'21: System Demonstrations
(Best Demo Paper Award at NAACL2021) - Connecting the Dots: Event Graph Schema Induction with Path Language Modeling
Manling Li, Qi Zeng, Ying Lin, Kyunghyun Cho, Heng Ji, Jonathan May, Nathanael Chambers and Clare Voss
EMNLP'20
2020
- GAIA: A Fine-grained Multimedia Knowledge Extraction System
Manling Li*, Alireza Zareian*, Ying Lin, Xiaoman Pan, Spencer Whitehead, Brian Chen, Bo Wu, Heng Ji, Shih-Fu Chang, Clare R. Voss, Dan Napierski, Marjorie Freedman
ACL'20
(Best Demo Paper Award at ACL2020) - Cross-media Structured Common Space for Multimedia Event Extraction
Manling Li*, Alireza Zareian*, Qi Zeng, Spencer Whitehead, Di Lu, Heng Ji, Shih-Fu Chang
ACL'20 pp.2557–2568
- GAIA at SM-KBP 2020: A Dockerized Multi-media Multi-lingual Knowledge Extraction, Clustering, Temporal Tracking and Hypothesis Generation System
Manling Li, Ying Lin, Tuan Manh Lai, Xiaoman Pan, Haoyang Wen, Sha Li, etc %Zhenhailong Wang, Pengfei Yu, Lifu Huang, Di Lu, Qingyun Wang, Haoran Zhang, Qi Zeng, Chi Han, Zixuan Zhang, Yujia Qin, Xiaodan Hu, Nikolaus Parulian, Daniel Campos, Heng Ji, Brian Chen, Xudong Lin, Alireza Zareian, Amith Ananthram, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Yifan Wang, Michael Spector, Mitchell DeHaven, Daniel Napierski, Marjorie Freedman, Pedro Szekely, Haidong Zhu, Ram Nevatia, Yang Bai, Yifan Wang, Ali Sadeghian, Haodi Ma, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2020
Rank 1st in the National Institute of Standards and Technology (NIST) Streaming Multimedia Knowledge Base Population (SM-KBP) 2020 - Keep Meeting Summaries on Topic: Abstractive Multi-Modal Meeting Summarization
Manling Li, Lingyu Zhang, Heng Ji, Rich Radke
ACL'19: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.2190–2196
2019
- GAIA at SM-KBP 2019: A Multi-media Multi-lingual Knowledge Extraction and Hypothesis Generation System
Manling Li, Ying Lin, Ananya Subburathinam, Spencer Whitehead, Xiaoman Pan, Di Lu, Qingyun Wang, Tongtao Zhang, Lifu Huang, Heng Ji, Alireza Zareian, Hassan Akbari, Brian Chen, Bo Wu, Emily Allaway, Shih-Fu Chang, Kathleen McKeown, Yixiang Yao, Jennifer Chen, Eric Berquist, Kexuan Sun, Xujun Peng, Ryan Gabbard, Marjorie Freedman, Pedro Szekely, T.K. Satish Kumar, Arka Sadhu, Ram Nevatia, Miguel Rodriguez, Yifan Wang, Yang Bai, Ali Sadeghian, Daisy Zhe Wang
TAC-KBP: Text Analysis Conference Knowledge Base Population Workshop 2019
Rank 1st in the National Institute of Standards and Technology (NIST) Streaming Multimedia Knowledge Base Population (SM-KBP) 2019