Publications

Selected Works (FULL LIST)

2026

  1. internsvg.png
    InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
    Haomin Wang, Jinhui Yin, Qi Wei, and 12 more authors
    In The Fourteenth International Conference on Learning Representations (ICLR), 2026
  2. internspatial.png
    InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models
    Nianchen Deng, Lixin Gu, Shenglong Ye, and 18 more authors
    In The Fourteenth International Conference on Learning Representations (ICLR), 2026
  3. gma.png
    Pose-Free Infant General Movement Assessment Using Body Contours
    Yanting Zhang, Kewei Chen, Jielin Huang, and 4 more authors
    In 2026 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2026
  4. focallink.png
  5. have-bench.png
    HAVE-Bench: Hierarchical Audio-Visual Evaluation from Perception to Interaction
    Muyan Zhong, Erfei Cui, Sen Xing, and 8 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  6. mmbench-gui.png
    MMBench-GUI: A Unified Hierarchical Evaluation Framework for Multi-Platform GUI Agents
    Xuehui Wang, Zhenyu Wu, JingJing Xie, and 22 more authors
    In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

2025

  1. pointline.png
    Point or Line? Using Line-based Representation for Panoptic Symbol Spotting in CAD Drawings
    Xingguang Wei, Haomin Wang, Shenglong Ye, and 7 more authors
    In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
  2. archcad400k.png
    ArchCAD-400K: A Large-Scale CAD drawings Dataset and New Baseline for Panoptic Symbol Spotting
    Ruifeng Luo, Zhengjie Liu, Tianxiao Cheng, and 13 more authors
    In The Thirty-ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025
  3. TCSVT.jpg
    Pose-Guided Transformer for Fine-Grained Action Quality Assessment
    Yanting Zhang, Xia Li, Wenhao Chai, and 3 more authors
    IEEE Transactions on Circuits and Systems for Video Technology, 2025
  4. single frame point pixel.png
    Single-Frame Point-Pixel Registration via Supervised Cross-Modal Feature Matching
    Yu Han, Zhiwei Huang, Yanting Zhang, and 5 more authors
    IEEE Transactions on Automation Science and Engineering, 2025

2024

  1. prcv.png
    Taming Diffusion for Fashion Clothing Generation with Versatile Condition
    Yanting Zhang, Jingyi Guo, Cairong Yan, and 1 more author
    In Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 2024
  2. movieChat.jpg
    Moviechat: From dense token to sparse memory for long video understanding
    Enxin Song, Wenhao Chai, Guanhong Wang, and 8 more authors
    IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024
  3. uniap.jpg
    UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning
    Meiqi Sun, Zhonghan Zhao, Wenhao Chai, and 5 more authors
    The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024
  4. icra.jpg
    Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation
    Yu Han, Ziwei Long, Yanting Zhang, and 3 more authors
    International Conference on Robotics and Automation (ICRA), 2024

2023

  1. TransLink.png
  2. dffFashion.jpg
    Difffashion: Reference-based fashion design with structure-aware transfer by diffusion models
    Shidong Cao, Wenhao Chai, Shengyu Hao, and 3 more authors
    IEEE Transactions on Multimedia, 2023
  3. tfy.jpg
    Learning Golf Swing Key Events from Gaussian Soft Labels Using Multi-Scale Temporal MLPFormer
    Yanting Zhang, Fuyu Tu, Zijian Wang, and 2 more authors
    In 2023 International Joint Conference on Neural Networks (IJCNN), 2023

2022

  1. traffic-signs.jpg
    Detection-by-tracking of traffic signs in videos
    Yanting Zhang, Zijian Wang, Ruoning Song, and 2 more authors
    Applied Intelligence, 2022
  2. icme_tracking.png
    On-Road Pedestrian Tracking Across Multiple Moving Cameras
    Yanting Zhang, Shuanghong Wang, Qingxiang Wang, and 2 more authors
    In 2022 IEEE International Conference on Multimedia and Expo (ICME), 2022
  3. wqa.jpg
    Automatic Moving Pose Grading for Golf Swing in Sports
    Yanting Zhang, Fuyu Tu, Zijian Wang, and 1 more author
    In 2022 IEEE International Conference on Image Processing (ICIP), 2022

2021

  1. iccvw_tracking.png
    Pedestrian tracking through coordinated mining of multiple moving cameras
    Yanting Zhang, and Qingxiang Wang
    In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021
  2. icassp2021.jpg
    Vehicle 3D localization in road scenes VIA a monocular moving camera
    Yanting Zhang, Aotian Zheng, Ke Han, and 2 more authors
    In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

2019

  1. bundle.jpg
    Bundle adjustment for monocular visual odometry based on detections of traffic signs
    Yanting Zhang, Haotian Zhang, Gaoang Wang, and 2 more authors
    IEEE transactions on vehicular technology, 2019
  2. visdrone.jpg
    Visdrone-mot2019: The vision meets drone multiple object tracking challenge results
    Longyin Wen, Pengfei Zhu, Dawei Du, and 8 more authors
    In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019
  3. features.jpg
    Bundle adjustment for monocular visual odometry based on detected traffic sign features
    Yanting Zhang, Jie Yang, Haotian Zhang, and 1 more author
    In 2019 IEEE International Conference on Image Processing (ICIP), 2019