山东科学

• 交通运输 •    

考虑非均衡性的城市自行车事故骑行者伤害程度影响因素及异质性分析

王朝健1,2,徐小金1,冯斌1,余松霖1,张卫东1   

  1. 1四川三河职业技术学院 工程技术学院,四川 泸州 646200;2 泸州市智能机电控制工程技术研究中心,四川 泸州 646200
  • 收稿日期:2025-04-03 接受日期:2025-04-30 上线日期:2025-12-09
  • 通信作者: 王朝健 E-mail:549786670@qq.com
  • 作者简介:王朝健(1997—),男,硕士研究生,助教,研究方向为交通安全。
  • 基金资助:
    泸州市智能机电控制工程技术研究中心项目(ZNJDKT25-09);泸州市科技局项目(2024RCM238)

Analysis of factors influencing cyclist injury severity and heterogeneity analysis in urban bicycle accidents considering data imbalance

WANG Chaojian1,2,XU Xiaojin1,FENG Bin1,YU Songlin1,ZHANG Weidong1   

  1. 1. School of Engineering and Technology, Sichuan Sanhe College of Prof, Luzhou 646200, China;2. Luzhou City Research Center for Intelligent Electromechanical Control Engineering Technology, Luzhou 646200, China
  • Received:2025-04-03 Accepted:2025-04-30 Online:2025-12-09
  • Contact: WANG Chaojian E-mail:549786670@qq.com

摘要: 为探究城市自行车事故骑行者伤害程度的影响因素,同时降低数据异质性和非均衡性对因素量化的影响。基于CRSS数据库的3 895起自行车事故,提出了一种融合重采样、潜在类别分析(LCA)和Bayes网络(BN)的方法。首先,采用LCA将事故数据重新划分为若干组具有组内同质性和组间异质性的子事故群,减少数据异质性的影响;其次,采用随机过采样(ROS)、合成少数类过采样技术(SMOTE)和自适应合成过采样算法(ADASYN)对各事故群重采样,减少数据非均衡性的影响;最后,基于各类重采样后的事故群,分别搭配2种BN结构学习算法和1种参数学习算法,并依据AUC值评选每类事故群的最优BN模型,实现骑行者伤害程度影响因素的定量分析和异质性分析。研究结果表明:当整体事故数据被划分为3类同质子数据群时,LCA模型的Entropy值较优,达0.943。其中C1事故群、C2事故群、C3事故群和OD事故群分别被挖掘出10、13、9和12个影响骑行者伤害程度的关键因素;将LCA和重采样引入BN,能显著提升BN模型的G-mean值、AUC值和风险因素挖掘能力;时间段、骑行者性别、骑行者年龄和天气状况等因素在不同事故群中存在明显的异质性。

关键词: 交通安全, 自行车事故, 伤害程度, 潜在类别分析, Bayes网络

Abstract: To explore the factors influencing the injury severity of cyclists in urban bicycle accidents and mitigate the impact of data heterogeneity and imbalance on the quantification of these factors, this study proposes a method integrating resampling, latent class analysis (LCA), and Bayesian networks (BNs) based on 3 895 bicycle accidents from the CRSS database. First, LCA was used to reclassify accident data into several sub-accident clusters with intra-cluster homogeneity and inter-cluster heterogeneity to reduce the impact of data heterogeneity. Second, random over-sampling (ROS), synthetic minority oversampling technique, and adaptive synthetic sampling approach were used to resample each accident cluster to reduce the impact of data imbalance. Finally, based on various resampled accident clusters, two BN structure learning algorithms and one parameter learning algorithm were applied and the optimal BN model for each accident cluster was selected based onAUC values to enable quantitative and heterogeneity analyses of factors influencing the injury severity of cyclists. Results show that when the overall accident data were divided into three homogeneous sub-clusters, the LCA model achieved an increased entropy value of 0.943. For the C1, C2, C3, and OD accident clusters, 10, 13, 9, and 12 key factors influencing the injury severity of cyclists were identified, respectively. The introduction of LCA and resampling into the BN considerably improved the BN model’sG-meanvalue,AUC value, and risk factor identification capabilities. Factors such as time period, cyclist’s gender, cyclist’s age, and weather conditions showed substantial heterogeneity across different accident clusters.

Key words: traffic safety, bicycle accidents, injury severity, latent class analysis, Bayesian networks

中图分类号: 

  • U491.31

开放获取 本文遵循知识共享-署名-非商业性4.0国际许可协议(CC BY-NC 4.0),允许第三方对本刊发表的论文自由共享(即在任何媒介以任何形式复制、发行原文)、演绎(即修改、转换或以原文为基础进行创作),必须给出适当的署名,提供指向本文许可协议的链接,同时表明是否对原文作了修改,不得将本文用于商业目的。CC BY-NC 4.0许可协议详情请访问 https://creativecommons.org/licenses/by-nc/4.0