publications | Autolab

2026

CorrectAD: A Self-Correcting Agentic System to Improve End-to-end Planning in Autonomous Driving

Enhui Ma, Lijun Zhou, Tao Tang, and 11 more authors

In AAAI Conference on Artificial Intelligence, 2026

PDF
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies

Wei Song, Yuran Wang, Zijia Song, and 6 more authors

In ICLR International Conference on Learning Representations, 2026

PDF
DriveCombo: Benchmarking Compositional Traffic Rule Reasoning in Autonomous Driving

Enhui Ma, Jiahuan Zhang, Guantian Zheng, and 10 more authors

In CVPR Computer Vision and Pattern Recognition , 2026

PDF

2025

OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving

Tao Tang, Enhui Ma, Xia Zhou, and 9 more authors

In Proceedings of the 33rd ACM International Conference on Multimedia, 2025

PDF
BEVHeight++: Toward Robust Visual Centric 3D Object Detection

Lei Yang, Tao Tang, Jun Li, and 6 more authors

IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

PDF
Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving

Mingyi Wang, Jingke Wang, Tengju Ye, and 2 more authors

In Proceedings of the 9th Annual Conference on Robot Learning, 2025

HTML
SR-LLM: Rethinking the Structured Representation in Large Language Model

Jiahuan Zhang, Tianheng Wang, Ziyi Huang, and 7 more authors

In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025

PDF
Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models

Jiahuan Zhang, Shunwen Bai, Tianheng Wang, and 4 more authors

arXiv preprint arXiv:2507.02978, 2025

PDF
THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?

Xin Wang, Jiyao Liu, Yulong Xiao, and 5 more authors

arXiv preprint arXiv:2506.21763, 2025

PDF
DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Zhiyi Hou^*, Enhui Ma^*, Fang Li^*, and 8 more authors

arXiv preprint arXiv:2507.02948, 2025

PDF
Dualtoken: Towards unifying visual understanding and generation with dual visual vocabularies

Wei Song, Yuran Wang, Zijia Song, and 7 more authors

arXiv preprint arXiv:2503.14324, 2025

PDF
Autoregressive semantic visual reconstruction helps vlms understand better

Dianyi Wang^*, Wei Song^*, Yikun Wang, and 4 more authors

arXiv preprint arXiv:2506.09040, 2025

PDF

2024

Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training

Xiaoyang Wu, Zhuotao Tian, Xin Wen, and 4 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PDF
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis

Tang Tao, Guangrun Wang, Yixing Lao, and 5 more authors

In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

PDF
DiVE: DiT-based Video Generation with Enhanced Control

Junpeng Jiang, Gangyi Hong, Lijun Zhou, and 10 more authors

In Proceedings of the European Conference on Computer Vision Workshops, 2024

PDF
LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields

Tang Tao, Longfei Gao, Guangrun Wang, and 7 more authors

Proceedings of the ACM International Conference on Multimedia, 2024

PDF
Tutorial: Large Language-Vision Model in Society

Kaicheng Yu, Zhuang Shao, Siyuan Qi, and 1 more author

In Proceedings of the ACM International Conference on Multimedia, 2024

PDF
LiT: Unifying LiDAR "Languages" with LiDAR Translator

Yixing Lao, Tao Tang, Xiaoyang Wu, and 3 more authors

In Advances in Neural Information Processing Systems, 2024

PDF
OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection

Hu Zhang, Jianhua Xu, Tao Tang, and 4 more authors

In Proceedings of the European Conference on Computer Vision, 2024

PDF
Semantic-DARTS: Elevating Semantic Learning for Mobile Differentiable Architecture Search

Bicheng Guo, Shibo He, Miaojing Shi, and 3 more authors

In IEEE Transactions on Mobile Computing, 2024

HTML
Unleashing generalization of end-to-end autonomous driving with controllable long video generation

Enhui Ma^*, Lijun Zhou^*, Tao Tang, and 8 more authors

arXiv preprint arXiv:2406.01349, 2024

PDF
Biokgbench: A knowledge graph checking benchmark of ai agent for biomedical science

Xinna Lin, Siqi Ma, Junjie Shan, and 5 more authors

arXiv preprint arXiv:2407.00466, 2024

PDF
PatentAgent: Intelligent Agent for Automated Pharmaceutical Patent Analysis

Xin Wang, Yifan Zhang, Xiaojing Zhang, and 5 more authors

2024

PDF

2023

Bevcontrol: Accurately controlling street-view elements with multi-perspective consistency via bev sketch layout

Kairui Yang^*, Enhui Ma^*, Jibin Peng, and 3 more authors

arXiv preprint arXiv:2308.01661, 2023

PDF