面向复杂交通场景的目标检测模型YOLO-T

doi:10.3976/j.issn.1002-4026.20240047

Abstract

Abstract:

To address the challenges posed by complex traffic scenarios, particularly congested roads where traffic objects are densely packed and often occlude each other and small-scale objects are detected inaccurately, a new object detection model called YOLO-T (You Only Look Once-Transformer) is proposed. First, the CTNet backbone network is introduced, which has a deeper network structure and multiscale feature extraction module compared with CSPDarknet53. Not only can it better learn the multilevel features of dense objects but can also improve the model’s ability to handle complex traffic scenarios. Moreover, it directs the model’s focus toward the feature information of small objects, thereby improving the detection performance for small-scale objects. Second, Vit-Block is incorporated, which integrates more features by parallelly combining convolution and Transformer. This approach balances the relevance of local and contextual information, thereby enhancing detection accuracy. Finally, the Reasonable module is added after the Neck network, introducing attention mechanisms to further improve the robustness of the object detection algorithm against complex scenarios and occluded objects. Experimental results indicate that compared with baseline algorithms, YOLO-T achieves a 1.92% and 12.78% increase in detection accuracy on the KITTI and BDD100K datasets, respectively. This enhancement effectively boosts detection performance in complex traffic scenarios and can assist drivers to better predict the behaviors of other vehicles, thus reducing the occurrence of traffic accidents.

Key words: intelligent transportation, deep learning, object detection, YOLO, complex traffic scenarios

CLC Number:

TP391

LIU Yu, GAO Shangbing, ZHANG Qintao, ZHANG Yingying. Object detection model YOLO-T for complex traffic scenarios[J].Shandong Science, 2024, 37(6): 104-115.

Figures/Tables 15

Fig.1

Table 1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Fig.8

Table 2

Table 3

Table 4

Table 5

Table 6

Fig.9

References 20

[1]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 770-778. DOI: 10.1109/CVPR.2016.90.
[2]	张新钰, 高洪波, 赵建辉, 等. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58(4): 438-444. DOI: 10.16511/j.cnki.qhdxxb.2018.21.010.
[3]	邵将, 颜克彤, 姚君, 等. 头戴式AR界面目标符号的视觉搜索实验研究[J]. 东南大学学报(自然科学版), 2020, 50(1): 20-25. DOI: 10.3969/j.issn.1001-0505.2020.01.003.
[4]	GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]// 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 580-587. DOI: 10.1109/CVPR.2014.81.
[5]	GIRSHICK R. Fast R-CNN[C]// 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE, 2015: 1440-1448. DOI: 10.1109/ICCV.2015.169.
[6]	REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031. pmid: 27295650
[7]	REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection[C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, NV, USA: IEEE, 2016: 779-788. DOI: 10.1109/CVPR.2016.91.
[8]	LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[C]// European Conference on Computer Vision. Cham: Springer, 2016: 21-37.10.1007/978-3-319-46448-0_2.
[9]	LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2): 318-327. DOI: 10.1109/TPAMI.2018.2858826.
[10]	DOSOVTSKIY A, BEYER L, KOLESNKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale[EB/OL]. [2024-03-26]. https://doi.org/10.48550/arXiv.2010.11929.
[11]	李丽芬, 黄如. 引入Transformer的道路小目标检测[J]. 计算机工程与设计, 2024, 45(1): 95-101. DOI: 10.16208/j.issn1000-7024.2024.01.013.
[12]	庞玉东, 李志星, 刘伟杰, 等. 基于改进实时检测Transformer的塔机上俯视场景小目标检测模型[J/OL]. 计算机应用, 2024:1-10[2024-03-26]. https://link.cnki.net/urlid/51.1307.TP.20240402.2133.013.
[13]	罗漫, 李军. 基于CNN技术和DETR的智能汽车自动驾驶道路智能识别的研究[J]. 长江信息通信, 2023(11): 32-34.
[14]	LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE, 2017: 936-944. DOI: 10.1109/CVPR.2017.106.
[15]	LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE, 2018: 8759-8768. DOI: 10.1109/CVPR.2018.00913.
[16]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,California, USA: ACM, 2017: 6000-6010. DOI: 10.5555/3295222.3295349.
[17]	GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? The KITTI vision benchmark suite[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 3354-3361. DOI: 10.1109/CVPR.2012.6248074.
[18]	SEITA D. BDD100k: a large-scale diverse driving video database[EB/OL]. [2024-03-26]. http://bdd-data.berkeley.edu.
[19]	ZHOU X Y, WANG D Q, KRAHENBUHL P. Objects as points[EB/OL]. [2024-03-26]. https://doi.org/10.48550/arXiv.1904.07850.
[20]	WANG C Y, BOCHKOVSKIY A, LIAO H Y M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[EB/OL]. [2024-03-26]. http://arxiv.org/abs/2207.02696.

Stage(i)	Operator(f(x_i))	Resolution(h×w)	Channels(c)
1	Focus	320×320	12
2	CBS	320×320	64
3	Vit-Block-1	160×160	128
4	Vit-Block-2	80×80	256
5	Vit-Block-3	40×40	512
6	Vit-Block-4	20×20	1 024
7	Vit-Block-5	10×10	2 048

模型名称	主干网络	Size/MB	KITTI		BDD100K
模型名称	主干网络	Size/MB	P_mA/%	FPS/(帧·s^-1)	P_mA/%		FPS/(帧·s^-1)
SSD	VGG-16	100.28	66.52	62.05	33.77	35.77
Centernet	Resnet50	124.94	68.62	64.30	32.34	43.05
YOLOv5s	CSPDarknet53	27.24	91.22	79.20	51.40	49.45
YOLOv7	E-ELAN	142.38	93.29	41.50	60.76	31.91
YOLO-T	CTNet	90.40	93.34	70.05	64.18	40.18

模型	P_A
模型	car	truck	tram	van	cyclist	pedestrain
SSD	0.71	0.68	0.65	0.67	0.68	0.60
Centernet	0.75	0.71	0.64	0.65	0.72	0.64
YOLOv5s	0.96	0.97	0.96	0.95	0.95	0.84
YOLOv7	0.97	0.96	0.97	0.95	0.95	0.84
YOLO-T	0.98	0.97	0.97	0.95	0.95	0.85

模型	P_A
模型	car	truck	traffic sign	bus	traffic light	person	bike	motor
SSD	0.49	0.41	0.39	0.31	0.28	0.37	0.35	0.11
Centernet	0.48	0.43	0.36	0.30	0.29	0.40	0.32	0.07
YOLOv5s	0.68	0.59	0.50	0.51	0.49	0.52	0.46	0.37
YOLOv7	0.75	0.68	0.61	0.60	0.58	0.56	0.54	0.54
YOLO-T	0.78	0.71	0.65	0.64	0.61	0.61	0.57	0.57

Stage	CTNet	Vit-Block	Reasonable	P_mA/%
1				91.22
2	√			91.52
3		√		92.13
4			√	91.95
5	√	√		92.85
6	√	√	√	93.14

Object detection model YOLO-T for complex traffic scenarios

RichHTML

PDF (PC)

Like

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 15

References 20

Related Articles 6

Metrics

Comments

Recommended 0

Stage	CTNet	Vit-Block	Reasonable	P_mA/%
1				51.40
2	√			53.20
3		√		57.20
4			√	55.40
5	√	√		61.35
6	√	√	√	64.18

[1]	YU Yu, GUO Baoqi, CHU Shibo, LI Heng, YANG Pengru. Survey of underwater biological object detection methods based on deep learning [J]. Shandong Science, 2023, 36(6): 1-7.
[2]	LI Lian-wei,ZHANG Yuan-yu,YUE Zeng-you,XUE Cun-jin,FU Yu-xuan,XU Yang-feng. Extracting inland cage aquacultural areas from high-resolution remote sensing images using fully convolutional networks model [J]. Shandong Science, 2022, 35(2): 1-10.
[3]	YANG Xue-Jie, CHEN Wen-Dong, XU Rong-Hao, LI Song-Lin, LI Jian-Ye. Real-time inspection system for transmission line equipment based on Jetson-TX2 [J]. Shandong Science, 2021, 34(2): 81-89.
[4]	NIU Ling, JING Yuan-wang, LUO Lin. Application of intelligent transportation during joint prevention and control of the epidemic [J]. Shandong Science, 2020, 33(2): 17-21.
[5]	WEN Yong-qi, XIE Dong-fan, WANG Xiang, ZHOU Hao. Lane-changing strategy based on inter-vehicle communication [J]. Shandong Science, 2019, 32(4): 46-55.
[6]	CHEN Changying,YANG Xiuhong,JIANG Shuming. Performance comprehensive assessment method for intelligent transportation system of Shandong Province [J]. SHANDONG SCIENCE, 2015, 28(6): 93-100.