Shandong Science

   

Analysis of factors influencing cyclist injury severity and heterogeneity analysis in urban bicycle accidents considering data imbalance

WANG Chaojian1,2,XU Xiaojin1,FENG Bin1,YU Songlin1,ZHANG Weidong1   

  1. 1. School of Engineering and Technology, Sichuan Sanhe College of Prof, Luzhou 646200, China;2. Luzhou City Research Center for Intelligent Electromechanical Control Engineering Technology, Luzhou 646200, China
  • Received:2025-04-03 Accepted:2025-04-30 Online:2025-12-09
  • Contact: WANG Chaojian E-mail:549786670@qq.com

Abstract: To explore the factors influencing the injury severity of cyclists in urban bicycle accidents and mitigate the impact of data heterogeneity and imbalance on the quantification of these factors, this study proposes a method integrating resampling, latent class analysis (LCA), and Bayesian networks (BNs) based on 3 895 bicycle accidents from the CRSS database. First, LCA was used to reclassify accident data into several sub-accident clusters with intra-cluster homogeneity and inter-cluster heterogeneity to reduce the impact of data heterogeneity. Second, random over-sampling (ROS), synthetic minority oversampling technique, and adaptive synthetic sampling approach were used to resample each accident cluster to reduce the impact of data imbalance. Finally, based on various resampled accident clusters, two BN structure learning algorithms and one parameter learning algorithm were applied and the optimal BN model for each accident cluster was selected based onAUC values to enable quantitative and heterogeneity analyses of factors influencing the injury severity of cyclists. Results show that when the overall accident data were divided into three homogeneous sub-clusters, the LCA model achieved an increased entropy value of 0.943. For the C1, C2, C3, and OD accident clusters, 10, 13, 9, and 12 key factors influencing the injury severity of cyclists were identified, respectively. The introduction of LCA and resampling into the BN considerably improved the BN model’sG-meanvalue,AUC value, and risk factor identification capabilities. Factors such as time period, cyclist’s gender, cyclist’s age, and weather conditions showed substantial heterogeneity across different accident clusters.

Key words: traffic safety, bicycle accidents, injury severity, latent class analysis, Bayesian networks

CLC Number: 

  • U491.31

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0), which permits third parties to freely share (i.e., copy and redistribute the material in any medium or format) and adapt (i.e., remix, transform, or build upon the material) the articles published in this journal, provided that appropriate credit is given, a link to the license is provided, and any changes made are indicated. The material may not be used for commercial purposes. For details of the CC BY-NC 4.0 license, please visit: https://creativecommons.org/licenses/by-nc/4.0