publications
publications by categories in reversed chronological order.
2025
- OmniGen: Unified Multimodal Sensor Generation for Autonomous DrivingarXiv preprint arXiv:2508.xxxxx, 2025
- BEVHeight++: Toward Robust Visual Centric 3D Object DetectionIEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
- Do LLM Modules Generalize? A Study on Motion Generation for Autonomous DrivingIn Proceedings of the 9th Annual Conference on Robot Learning, 2025
- SR-LLM: Rethinking the Structured Representation in Large Language ModelIn Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2025
- Ascending the Infinite Ladder: Benchmarking Spatial Deformation Reasoning in Vision-Language ModelsarXiv preprint arXiv:2507.02978, 2025
- THE-Tree: Can Tracing Historical Evolution Enhance Scientific Verification and Reasoning?arXiv preprint arXiv:2506.21763, 2025
- DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk PredictionarXiv preprint arXiv:2507.02948, 2025
- Dualtoken: Towards unifying visual understanding and generation with dual visual vocabulariesarXiv preprint arXiv:2503.14324, 2025
- Autoregressive semantic visual reconstruction helps vlms understand betterarXiv preprint arXiv:2506.09040, 2025
2024
- Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt TrainingIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
- AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint SynthesisIn Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
- DiVE: DiT-based Video Generation with Enhanced ControlIn Proceedings of the European Conference on Computer Vision Workshops, 2024
- LiDAR-NeRF: Novel LiDAR View Synthesis via Neural Radiance FieldsProceedings of the ACM International Conference on Multimedia, 2024
- Tutorial: Large Language-Vision Model in SocietyIn Proceedings of the ACM International Conference on Multimedia, 2024
- LiT: Unifying LiDAR "Languages" with LiDAR TranslatorIn Advances in Neural Information Processing Systems, 2024
- OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object DetectionIn Proceedings of the European Conference on Computer Vision, 2024
- Semantic-DARTS: Elevating Semantic Learning for Mobile Differentiable Architecture SearchIn IEEE Transactions on Mobile Computing, 2024
- Unleashing generalization of end-to-end autonomous driving with controllable long video generationarXiv preprint arXiv:2406.01349, 2024
- Biokgbench: A knowledge graph checking benchmark of ai agent for biomedical sciencearXiv preprint arXiv:2407.00466, 2024
-
2023
- Bevcontrol: Accurately controlling street-view elements with multi-perspective consistency via bev sketch layoutarXiv preprint arXiv:2308.01661, 2023