山东科学 ›› 2024, Vol. 37 ›› Issue (6): 104-115.doi: 10.3976/j.issn.1002-4026.20240047

• 交通运输 • 上一篇    下一篇

面向复杂交通场景的目标检测模型YOLO-T

刘宇1,2(), 高尚兵1,2,*(), 张秦涛1,2, 张莹莹1   

  1. 1.淮阴工学院 计算机与软件工程学院,江苏 淮安 223003
    2.江苏省物联网移动互联技术工程实验室,江苏 淮安 223001
  • 收稿日期:2024-04-02 出版日期:2024-12-20 发布日期:2024-12-05
  • 通信作者: *高尚兵(1981—),男,博士,教授,研究方向为智能交通。E-mail:gaoshangbing@hyit.edu.cn
  • 作者简介:刘宇(1997—),男,硕士研究生,研究方向为智能交通。E-mail:1102284633@qq.com
  • 基金资助:
    国家自然科学基金面上项目(62076107);国家重点研发计划(2018YFB1004904);江苏省高校自然科学研究重大项目(18KJA520001)

Object detection model YOLO-T for complex traffic scenarios

LIU Yu1,2(), GAO Shangbing1,2,*(), ZHANG Qintao1,2, ZHANG Yingying1   

  1. 1. College of Computer and Software Engineering,Huaiyin Institute of Technology, Huai'an 223003, China
    2. Laboratory for Internet of Things and Mobile Internet Technology of Jiangsu Province,Huai'an 223001, China
  • Received:2024-04-02 Online:2024-12-20 Published:2024-12-05

摘要:

针对复杂交通场景下,特别是拥堵道路中,经常出现的交通目标密集、互相遮挡,小尺度目标检测精度低的问题,提出了一种面向复杂交通场景的目标检测模型YOLO-T(You Only Look Once-Transformer)。首先提出CTNet主干网络,相较于CSPDarknet53,该主干拥有更深的网络结构和多尺度特征提取模块,不仅能够更好地学习密集目标的多级特征,还可以提高模型对复杂交通场景的应对能力,进而引导模型更加关注小目标的特征信息,提升小目标的检测性能;其次引入Vit-Block,采用卷积和Transformer并行的方式融合更多的特征,兼顾局部和上下文信息的关联性,从而提升检测精度;最后在颈部网络Neck后增加Reasonable模块,引入注意力机制,进一步提高目标检测算法对复杂场景和遮挡目标的鲁棒性。实验结果表明,相比基准算法,YOLO-T在KITTI数据集和BDD100K数据集的检测精度分别提高了1.92%和12.78%,能有效提升复杂交通场景下的检测性能,更好地辅助驾驶员对其他车辆行驶行为的判断,减少交通事故的发生。

关键词: 智能交通, 深度学习, 目标检测, YOLO, 复杂交通场景

Abstract:

To address the challenges posed by complex traffic scenarios, particularly congested roads where traffic objects are densely packed and often occlude each other and small-scale objects are detected inaccurately, a new object detection model called YOLO-T (You Only Look Once-Transformer) is proposed. First, the CTNet backbone network is introduced, which has a deeper network structure and multiscale feature extraction module compared with CSPDarknet53. Not only can it better learn the multilevel features of dense objects but can also improve the model’s ability to handle complex traffic scenarios. Moreover, it directs the model’s focus toward the feature information of small objects, thereby improving the detection performance for small-scale objects. Second, Vit-Block is incorporated, which integrates more features by parallelly combining convolution and Transformer. This approach balances the relevance of local and contextual information, thereby enhancing detection accuracy. Finally, the Reasonable module is added after the Neck network, introducing attention mechanisms to further improve the robustness of the object detection algorithm against complex scenarios and occluded objects. Experimental results indicate that compared with baseline algorithms, YOLO-T achieves a 1.92% and 12.78% increase in detection accuracy on the KITTI and BDD100K datasets, respectively. This enhancement effectively boosts detection performance in complex traffic scenarios and can assist drivers to better predict the behaviors of other vehicles, thus reducing the occurrence of traffic accidents.

Key words: intelligent transportation, deep learning, object detection, YOLO, complex traffic scenarios

中图分类号: 

  • TP391

开放获取 本文遵循知识共享-署名-非商业性4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同时表明是否对原文作了修改,不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 https://creativecommons.org/licenses/by-nc/4.0