Behavioral Prediction Plays a Key Role in Autonomous Vehicles
post-template-default,single,single-post,postid-21333,single-format-standard,bridge-core-1.0.4,mec-dark-mode,translatepress-en_US,bridge,mega-menu-top-navigation,ajax_fade,page_not_loaded,,qode_grid_1400,qode-content-sidebar-responsive,qode-theme-ver-18.0.6,qode-theme-onetech,disabled_footer_bottom,wpb-js-composer js-comp-ver-5.7,vc_responsive

Using Behavioral Prediction in Autonomous Vehicles

Autonomous Vehicles

Using Behavioral Prediction in Autonomous Vehicles

Over the next decade autonomous vehicles (AV) are expected to reduce the number of road accidents and to improve overall road safety. Behavioral prediction plays a key role in efficient decision making and enables risk assessment in AV applications. For example, while driving on a two-lane road you are intending to go left and another car in the second lane is coming from the opposite direction. How will that vehicle behave? Will it continue straight? Will it make a turn? Prediction in autonomous vehicles means predicting the trajectory or path of the other vehicle and deciding on an appropriate action to avoid collision.

Existing Challenges

Road geometry and traffic rules can completely change the behavior of vehicles. For instance, the behavior of vehicles approaching a four-way ‘STOP’ sign can change instantly. A model trained in a static driving environment– without considering traffic rules and road geometry–would prove limited value within other driving environments. Accurate prediction of vehicle behavior requires a multimodal approach. Multimodal means that more than one possible future action exists given the history of motion of a vehicle. For example, when a vehicle is approaching a ‘STOP’ sign without a turn-signal indicator, it can either go straight or make a turn.

Building an AV Prediction Model

To understand its environment and the dynamics and future behaviors of surrounding objects, the AV model requires data input. The typical inputs for AV prediction comes from sensor fusion and localization. Sensor fusion data is generated by using a Kalman Filter to combine inputs from multiple sensors (radar, LIDAR, etc.).

Bird’s-Eye View (BEV) rasterization is a common choice as the system’s input when working with AV data. BEV consists of top-down views of a scene. Building models using BEV simplifies the prediction process because the coordinate spaces of the input and output are the same. Infrastructure sensors can provide a non-occluded top-down view of the environment.

A self-driving system is a multi-agent environment. A deep supervised learning approach can address the multiagent environment by accurately capturing rare and unexpected behaviors on the road. Within the AV stack, the three tasks involved in building a self-driving system could be defined as follows:

  • Perception (identifying objects around the AV)
  • Prediction (determining appropriate next steps)
  • Planning (deciding future AV behaviors)


Focusing on prediction, one can build a model to provide the AV with critical data related to potential future behaviors. Research has shown that among deep learning-based models, complex models e.g., Multiple RNNs or Resnet or Combination of RNNs and CNNs achieve better performance compared to simple models like single RNN.

Predicting multimodal trajectory instead of unimodal trajectory might not always result in lower RMSE but can achieve better performance. For example, the models named GRIP and ST-LSTM achieved better performance compared to M-LSTM and CS-LSTM since GRP and ST-LSTM dealt with multimodal trajectories. Higher RMSE could be the result of limited model capacity and/or limited data used in training of multimodal trajectory prediction models.


Improving AV Prediction Models:

There are several mechanisms that can be employed to improve the overall performance of AV prediction models. Those would include the following:

  • Training speed: Increasing the dimensions of the first and last layer of the deep network improves speed and use of lighter EfficientNet instead of ResNet improves model performance.
  • Performance: Adding agent history to the prediction model can improve prediction accuracy and reliability.
  • Uncertainty capture: Multimodal prediction is preferred when one trajectory per agent is not enough to capture and analyze various situational uncertainties.


Evaluation Metrics

Evaluation metrics in AV prediction depends on how many factors are being predicted. Typical factors involved in vehicle behavior prediction include: accuracy, precision, recall, F1 score, and negative likelihood. Trajectory prediction evaluation metrics include FDE (Final displacement error), MAE (Mean absolute error), RMSE (Root Mean Square Error), Minimum of K metric, and cross entropy.


Deep learning solutions have shown promising performance for trajectory and behavior prediction in complex driving scenarios. Most existing solutions only consider the interaction among vehicles. These solutions provide a very narrow assessment of potential behaviors. Models that incorporate traffic rules, road geometry, environment conditions and other variables produce predictive analysis that is much deeper and broader.

Vehicle behavior prediction is a complex process. There are still many challenges that can only be met via utilization of next-generation technologies and applications.