publications

publications by categories in reversed chronological order.

2025

  1. omnigen_video.gif
    OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving
    arXiv preprint arXiv:2508.xxxxx, 2025
  2. BEVHeight.jpeg
    BEVHeight++: Toward Robust Visual Centric 3D Object Detection
    Lei Yang, Tao Tang, Jun Li, and 6 more authors
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
  3. Do_LLM_Modules_Generalize.jpeg
    Do LLM Modules Generalize? A Study on Motion Generation for Autonomous Driving
    Mingyi Wang, Jingke Wang, Tengju Ye, and 2 more authors
    In Proceedings of the 9th Annual Conference on Robot Learning, 2025
  4. srllm_teaser.png
    SR-LLM: Rethinking the Structured Representation in Large Language Model
    Jiahuan Zhang, Tianheng Wang, Ziyi Huang, and 7 more authors
    In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
  5. infbench_teaser.png
    Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language Models
    Jiahuan Zhang, Shunwen Bai, Tianheng Wang, and 4 more authors
    arXiv preprint arXiv:2507.02978, 2025
  6. THEtree_overview.png
    THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?
    Xin Wang, Jiyao Liu, Yulong Xiao, and 5 more authors
    arXiv preprint arXiv:2506.21763, 2025
  7. drivemrp_teaser.png
    DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction
    Zhiyi Hou*, Enhui Ma*, Fang Li*, and 8 more authors
    arXiv preprint arXiv:2507.02948, 2025
  8. dualtoken_arch.png
    Dualtoken: Towards unifying visual understanding and generation with dual visual vocabularies
    Wei Song, Yuran Wang, Zijia Song, and 7 more authors
    arXiv preprint arXiv:2503.14324, 2025
  9. asvr.png
    Autoregressive semantic visual reconstruction helps vlms understand better
    Dianyi Wang*, Wei Song*, Yikun Wang, and 4 more authors
    arXiv preprint arXiv:2506.09040, 2025

2024

  1. Towards-Large-scale.jpeg
    Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
    Xiaoyang Wu, Zhuotao Tian, Xin Wen, and 4 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
  2. AlignMiF.jpeg
    AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
    Tang Tao, Guangrun Wang, Yixing Lao, and 5 more authors
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
  3. DiVE.jpeg
    DiVE: DiT-based Video Generation with Enhanced Control
    Junpeng Jiang, Gangyi Hong, Lijun Zhou, and 10 more authors
    In Proceedings of the European Conference on Computer Vision Workshops, 2024
  4. LiDAR-NeRF.jpeg
    LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance Fields
    Tang Tao, Longfei Gao, Guangrun Wang, and 7 more authors
    Proceedings of the ACM International Conference on Multimedia, 2024
  5. Tutorial.jpeg
    Tutorial: Large Language-Vision Model in Society
    Kaicheng Yu, Zhuang Shao, Siyuan Qi, and 1 more author
    In Proceedings of the ACM International Conference on Multimedia, 2024
  6. LiT.jpeg
    LiT: Unifying LiDAR "Languages" with LiDAR Translator
    Yixing Lao, Tao Tang, Xiaoyang Wu, and 3 more authors
    In Advances in Neural Information Processing Systems, 2024
  7. OpenSight.jpeg
    OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
    Hu Zhang, Jianhua Xu, Tao Tang, and 4 more authors
    In Proceedings of the European Conference on Computer Vision, 2024
  8. Semantic-DARTS.jpeg
    Semantic-DARTS: Elevating Semantic Learning for Mobile Differentiable Architecture Search
    Bicheng Guo, Shibo He, Miaojing Shi, and 3 more authors
    In IEEE Transactions on Mobile Computing, 2024
  9. delphi_video.gif
    Unleashing generalization of end-to-end autonomous driving with controllable long video generation
    Enhui Ma*, Lijun Zhou*, Tao Tang, and 8 more authors
    arXiv preprint arXiv:2406.01349, 2024
  10. biokgbench_teaser.png
    Biokgbench: A knowledge graph checking benchmark of ai agent for biomedical science
    Xinna Lin, Siqi Ma, Junjie Shan, and 5 more authors
    arXiv preprint arXiv:2407.00466, 2024
  11. patentagent.png
    PatentAgent: Intelligent Agent for Automated Pharmaceutical Patent Analysis
    Xin Wang, Yifan Zhang, Xiaojing Zhang, and 5 more authors
    2024

2023

  1. bevcontrol_video.gif
    Bevcontrol: Accurately controlling street-view elements with multi-perspective consistency via bev sketch layout
    Kairui Yang*, Enhui Ma*, Jibin Peng, and 3 more authors
    arXiv preprint arXiv:2308.01661, 2023