Ya Kang, Qingyang Song, Jing Song, Fengsheng Pan, Lei Guo, Abbas Jamalipour
©SHUTTERSTOCK.COM/BLUE PLANET STUDIO
The cutting-edge technology of connected and automated vehicles (CAVs) will advance transportation systems for the foreseeable future. CAVs are expected to maintain fully automated judgment and manipulation without human intervention and, additionally, create safer driving and smarter traffic management. Digital twins (DTs) are the quiet but powerful forces enabling these new possibilities behind the scenes. In this article, we design a DT network (DTN) consisting of connected DTs to help CAVs in terms of ubiquitous perception, adaptive path planning, and precise motion control. Heterogeneous learning models and diverse learning methods are employed at different scales of solution, progressing toward specificity, adaptation, and accuracy. Qualitative evaluation of the proposed system is performed with the final goal of demonstrating the DTN’s assistance in improving the performance and effectiveness of CAVs, ultimately leading to a safer, more efficient, and more sustainable transportation system.
In recent years, the development of autonomous driving technology has gained significant attention as a potential solution to various challenges faced by the transportation industry [1]. Empowered by the combination of advanced processors, high-bandwidth low-latency networking, and accelerated machine learning (ML), CAVs are expected to revolutionize the way people travel, making it safer, more efficient, and more comfortable. DTs, virtual representations of physical objects or systems, are key to the long-term success of CAVs in the future. They can capture the behavioral and operational data of CAVs, build predictive models, facilitate decision making for CAVs, and aid in providing personalized experiences to riders.
Connected vehicles refers to vehicles equipped with communication technologies that enable them to exchange data and information with other vehicles, infrastructure, and external sources in real time. This ability enables vehicles to coordinate their movements, such as merging seamlessly and forming platoon properly. Ultimately, the synergy between autonomous driving and connected vehicles has the potential to create a more efficient, safer, and user-friendly transportation ecosystem that benefits both riders and society as a whole.
If CAVs move along roads completely without human intervention, there are numerous challenges in the real world. On the one hand, from an individual perspective, safety and comfort are the main concerns, in which the former imposes great requirements on ultrahigh perception accuracy and millisecond-level response time to emergencies, and the latter is closely related to preference learning and customized service provision under some necessary constraints (e.g., safety guarantee). On the other hand, from a collective perspective, road efficiency needs to be improved by reasonable planning of paths and adaptive management of traffic.
Inspired by the challenges, autonomous driving has emerged as a thriving research area in both academia and industry. Implementing a completely autonomous driving system can be accomplished by a three-phase driving control pipeline that comprises environmental perception, path planning, and motion control. During the perception, CAVs comprehend their surrounding environment by processing data from sensors and constructing a digital representation of their surroundings, such as vehicles, pedestrians, road markings, and traffic signals. To achieve a better understanding of the dynamically changing environment, Natan and Miura [2] proposed a novel compact deep multitask learning (DMTL) model that can simultaneously perform various driving perception tasks. After comprehensively understanding the road environment, researchers have made efforts to address the planning problem of CAVs’ driving paths. For instance, an iterative framework was proposed in [3] to solve target assignment and path planning problems, followed by the establishment of a graph-based minimum clique cover method to obtain the optimal scheduling plan. Finally, to follow the planned paths as accurately as possible, CAVs must adjust their speed, steering angle, and braking commands [4]. It is obvious that the three pivotal phases in the implementation of an autonomous driving system are closely related and interdependent. However, due to the lack in seamless convergence and coordination of perception, planning, and control in most existing works, fully autonomous driving still seems far away. Therefore, there is an urgent need to construct a new framework that can conduct the overall management of the three phases involved in autonomous driving.
DTs, born from intelligent manufacturing, gain momentum due to replicating real-world systems virtually [5]. Accurate simulation optimizes behavior, curbing costs and risks. DTs integrate the Internet of Things, artificial intelligence (AI), and analytics to enhance output. Virtual models, aided by ML and abundant data, foster innovation and performance in engineering. Growing digitization accelerates affordable DT development and has facilitated higher computing power and lower costs. Virtual solutions become more economical than relying solely on physical models, enabling proactive intervention. DTs enable universal, location-agnostic data sharing for physical objects.
When DTs are applied in autonomous vehicle systems, they include not only static map information of the traffic scene (e.g., geographical features, roads and highways, buildings and landmarks) but also dynamic traffic information (e.g., vehicle speeds, lane occupancy, and traffic signal status). Therefore, with the help of DTs, developers can model critical situations like sudden obstacles, vehicle–pedestrian interactions, or adverse weather conditions, and then identify potential vulnerabilities and refine strategies to ensure the safety of autonomous vehicle systems.
In autonomous driving, modularity enhances customization and upgrades for components. DT technology tracks performance, enabling optimization. Modularized autonomous systems, using DT insights, pinpoint and replace problematic components, enhancing the driving experience. DTs empower real-time and historical data use for safer, efficient, and personalized driving. Emerging 6G aids DT integration in CAVs, as explored by studies. Fan et al. [6] took advantage of DTs’ offline learning capability to perform an in-depth analysis on road traffic data and trained learning models on large datasets. These learning models were subsequently downloaded to edge CAVs and executed to make optimized decisions. In [7], a lane-changing model was initially learned in each CAV, taking into account both its own benefit and traffic flow efficiency. The learned models were then evaluated and verified in the DT domain, and the optimized one was pushed to the CAVs for lane-changing decisions. Therefore, DTs play an important role in the future CAV systems.
DTNs have emerged as an extension of the concept of DTs; for example, in smart manufacturing, DTNs interconnect DTs of machines, production lines, and supply chains, through which manufacturers can monitor real-time data, predict machine failures, optimize production schedules, and ensure efficient resource allocation. Similarly, DTNs revolutionize transportation systems by interconnecting DTs of vehicles, traffic signals, road infrastructure, and public transportation, enabling CAVs to see further down the road, even for things that are not on their horizons [8]. Nevertheless, the development of DTN-driven autonomous driving systems remains at stage 1 and is confronted with various problems and challenges, such as environmental perception of highly dynamic topology of CAVs, fast and smart decision making in complex context, the overall management of heterogeneous learning models, and the provision of personalized service.
This article puts forth a proposal for a DTN designed to support CAVs. As shown in Figure 1, the proposed DTN consists of three layers: namely, a physical layer, application layer, and virtual layer. The physical layer involves CAVs, sensors, roadside units (RSUs), and network topology. The virtual layer contains DTs replicating objects. Data exchange occurs via twin-physical interfaces. The application layer maps, manages, and controls the physical layer, using simulation, prediction, and optimization.
Figure 1 System architecture. V2I: vehicle-to-infrastructure; V2V: vehicle-to-vehicle.
As shown in Figure 2, the application layer consists of three modules—environmental perception, path planning, and motion control—which coordinate to both give instructions to large-scale (e.g., global) traffic management and help local CAVs make safe and appropriate actions in a given road segment. The goal of perception is to understand the state of the environment and detect objects, obstacles, or hazards. The path planning module generates a global path for each CAV based on the current state and performs piecewise short-term adjustments to achieve a specific goal. Finally, the motion control module modifies the vehicle’s behavior in real time to ensure it follows the planned trajectory.
Figure 2 Three modules in a DTN.
The environmental perception module leads to the construction of DTs by designing global and local models. The global model is trained at a cloud server by the collected raw data in a centralized way. However, it is difficult and impractical to make the global model adaptive to complicated and varying road environments under the constraint of limited communication resource. To cope with corner case scenarios, such as an appearance of a roadblock caused by an emergency and a traffic jam caused by a sudden car accident, local learning models focusing on different road segments are established to catch the small-scale changes sensitively. Therefore, the environmental perception module is built up from cars to vehicular networks, along with which the DTs replicating the perceived road segments are constructed and consequently aggregated into a DTN.
All of the sensors deployed in static/dynamic objects (including CAVs) in each road segment are utilized to perceive the surrounding environments. Staying in road segment si for an acceptable duration of time (for example, long enough for completing the training fully or partly), one certain CAV vi,j that is the most competent for training a perception model with the support of its computing, caching, and communication resources, is selected to collect all of the data from various sensors included in the segment, through orthogonal vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) channels. When sensor ${\phi}_{{i},{k}}$ transmits the data of size ${\psi}_{{i},{k}}$ over a channel ci,m, transmit rate ri,k,m can be calculated by Shannon’s formula. The delay for receiving the sensory data are ${\psi}_{{i},{k}}{/}{r}_{{i},{k},{m}}$. Therefore, the delay ${T}_{i}^{\text{collection}}$ for round-one data collection in road segment si is $\max\left({{\psi}_{{i},{k}}{/}{r}_{{i},{k},{m}}}\right)$, ${k}\in\left[{{1},{K}}\right]$, ${m}\in\left[{{1},{M}}\right]$, where K is the total number of the sensors and M is the total number of channels in road segment si.
A DMTL-based perception model is proposed to integrate and fuse various input modalities obtained from dynamic vision sensors (DVSs), red/green/blue (RGB) digital cameras, and lidar. At first, 2D raw data are fed into a 3D object-detection model to generate 3D object data. Contemporary 3D object-detection models, like PartA2-Net, Pointpillars, and SECOND, typically rely on deep neural networks (DNNs). Then a multiinput and multioutput encoder–decoder is proposed, as shown in Figure 3. Every RGB encoder is linked and merged with its corresponding semantic segmentation (SS) decoder, maintaining identical view and spatial dimensions. The data from DVSs serve the purpose of aiding depth estimation tasks. The encoder and decoder are connected and concatenated in a manner similar to the RGB–SS pair. The encoder processing preprocessed lidar point clouds is connected to the lidar segmentation decoder. In Figure 3, the convolution block is comprised of a combination of two sets, each containing a ${3}\,{\times}\,{3}$ convolutional layer, batch normalization, and rectified linear unit activation. To capture spatial features, a deep convolutional neural network (DCNN) is employed, comprising three convolutional layers and one fully connected (FC) layer.
Figure 3 Perception model framework. DCNN: deep convolutional neural network.
To provide a comprehensive understanding of the environment of road segment si, CAV vi,j performs the training of a DMTL-based perception model ${\Omega}_{i}$ by the collected raw sensory data. However, due to the fast movement, vi,j may not complete the whole training before it leaves segment si,. Therefore, a model splitting method is adopted to divide the training task into several parts and allocate them to the qualified CAVs. The DMTL-based perception model is partitioned vertically at the granularity of layers and then trained in a sequential manner. For instance, if the DMTL-based perception model has H layers, ${H}{-}{1}$ points can be selected for splitting, resulting in ${2}^{H}{-}{1}$ potential partitions. Since the required computing resource and the output size of each convolutional layer are different, the split point selection is critical to the computational cost and communication overhead. Assume the DMTL-based perception model is split into P parts. The intermediate output from training the $\left({{p}{-}{1}}\right){\text{th}}$ part at one CAV is used as the input to the Pth $\left({{1}\leq{p}\leq{P}}\right)$ part at another qualified CAV. The decision that the Pth $\left({{1}\leq{p}\leq{P}}\right)$ part is assigned to CAV vi,j for training is denoted by yi,j,p. Correspondingly, the training time ${T}_{{i},{j},{p}}^{\text{part}\_\text{training}}$ is the ratio of the specific task size ${\omega}_{{i},{p}}$ to the CPU frequency fi,j provided by CAV vi,j. The time ${T}_{{i},{p}}^{\text{part}\_\text{trans}}$ for transmitting the intermediate output over a V2V channel is obtained by dividing the output size by the V2V transmit rate. In addition, if the combined computing capacity of all CAVs is insufficient to complete the perception task, a nearby RSU will take this responsibility. Therefore, the total delay ${T}_{i}^{\text{md}\_\text{cons}}$ for constructing the perception model for road segment si is the sum of training time and V2V delivery time.
To achieve this splitting, a deep Q-network (DQN) algorithm is introduced. The DQN algorithm comprises a state space, an action space, and a reward function. The state space includes the available computing capacity of each CAV, the requested computing resource, the available transmission rate of each link, and the requested transmission resource. The action space consists of splitting the processed layers in a perception task into a number of sets and assigning the layers in different sets to different competent CAVs. The reward is defined as the processing delay of the perception. To be more specific, ${T}_{i}^{\text{md}\_\text{cons}}$ is minimized by optimizing the corresponding layer splitting and assignment under the current resource conditions.
After completing the training, the perception result is sent to the nearest RSU by the last CAV in the sequence. The uploading delay ${T}_{i}^{\text{md}\_\text{upload}}$ of the model is the ratio of the model size to the uploading rate. Then it takes the RSU certain amount of time ${T}_{i}^{\text{md}\_\text{execute}}$ to generate the DT of road segment si by AI/ML for data analysis, 3D modeling, simulation software, real-time data integration, and visualization tools.
All of the DTs from different road segments are uploaded to a server and connected to each other to form a small-scale DTN copying the considered road. From a large-scale perspective, a DTN copying all of the roads in the city will be further constructed by connecting more DTs.
To capture any change appearing in the road segments and help CAVs determine the appropriate actions and responses, each DT needs to copy the physical world accurately and meanwhile be updated in a timely manner. We adopt age of information (AoI) to measure the freshness of each DT. We define a utility function in terms of accuracy and AoI of DT, as: \[{\text{Utility}} = {w}_{1}\,{\text{log}}\left({\text{accuracy}}\right) + {w}_{2}\,{\text{log}}\left({\text{AoI}}\right) \tag{1} \] where the former is decided by the level of similarity to the physical world and the latter is decided by the delay for collecting data and executing the model by the collected data, and $\mid{w}_{1}\mid$ $\mid{w}_{2}\mid$ are the weights of the two factors, respectively.
Working closely with the environmental perception module, the path planning module in the DTN is responsible for determining the optimal path for each CAV to follow from its current location to its destination. This module takes into account various factors, such as traffic conditions, road types, speed limits, and other constraints to generate a path that is safe, time-saving, and power-efficient. Furthermore, a higher-level goal, such as maximizing rider comfort, is considered, leading to a more complex issue that involves individual physiological and psychological traits, as well as personal preferences. It is noted that the planned path is dynamic and slightly adjusted to be adaptive to the changing road environments during the process of driving. To be more specific, the actual driving path is decided by the combination of the long-term plan and the updated short-term plans.
The global path planning model comprises four components: encoder, decoder, attention mechanism, and a mixture density network (MDN). We integrate the encoder–decoder framework with an attention mechanism as the cornerstone of our model. This enables the extraction and abstraction of trajectory features, placing distinct emphasis on various aspects. The encoder is composed of two dilated convolution layers, two weight normalization layers, and two activation layers. The attention mechanism establishes a connection between the hidden states of the encoder and the decoder. It effectively captures the significance of neighboring vehicles through their spatiotemporal encoding, contributing to the prediction of future motion. Subsequently, this mechanism constructs a vector that encapsulates the environmental impact. The decoder is composed of two FC layers, two weight normalization layers, and two activation layers. The decoder’s output, along with the ground truth, is inputted into the MDN to ascertain Gaussian component parameters. In this process, an FC layer is crafted to transform the original data and allocate mixture components to align with the future trajectory’s distribution. Finally, the numerous forecasted trajectories in a sequence of ongoing dynamics are transposed into a spatiotemporal domain, leading to the generation of an octree map.
Before reaching the destination, to adapt to real-time changes in environments and requirements, a local path is planned in a timely manner along the decided long-term path and constantly updated to guide the CAV to control its motions properly. For local path planning, we propose a model consisting of a deep conditional generative (DCG) model, self-attention mechanism (SAM), intention recognition, and lane crossing prediction (LCP), as shown in Figure 4. In the DCG model, when accounting for the impact of a blended driving style on upcoming trajectories, probabilities are allocated to diverse potential driving styles. At first, the global path planning results are fed in an FC layer and a long short-term memory (LSTM) layer; then, the hidden states are relayed to three FC layers incorporated with a dropout layer; finally, an output activation layer is utilized to configure the parameters. In the SAM, multihead self-attention serves the purpose of collectively gathering information from distinct representation subspaces, thereby amplifying the model’s capacity for representation. The resultant representations are merged and subsequently passed through a gated linear unit to extract valuable insights while diminishing extraneous attributes. Intention recognition is proposed to acquire probabilities for various maneuvers, aiding the prediction of trajectories with multiple modes. In the LCP, an FC layer equipped with a softmax activation function is employed to compute the probabilities associated with lateral maneuver types, for example, staying in current lane (SiCL), moving to right lane (MtRL), and moving to left lane (MtLL). The local path is updated according to the obtained probabilities.
Figure 4 Path planning process. LSTM: long short-term memory.
The reward function is developed to avoid possible collision Rpc, increase driving speed Rds under the speed limits, and improve ride comfort Rrc. Rpc shows the probability of a collision if an action is made. Rds represents the increasing percentage of driving speed without breaking the speed limits. Rrc indicates riders’ tolerable discomfort represented by the ratio of the times of changing lane to the maximum acceptable times of changing lane during a period of time. Since safety is the most important concern for every CAV, we set the largest weight $\mid{w}_{\text{pc}}\mid$ to Rpc and larger weight $\mid{w}_{\text{ds}}\mid$ to Rds rather than that of $\mid{w}_{\text{rc}}\mid$ to Rrc. For example, the ratio of $\mid{w}_{\text{pc}}\mid$ to $\mid{w}_{\text{ds}}\mid$ is around 10; meanwhile, that of $\mid{w}_{\text{ds}}\mid$ to $\mid{w}_{\text{rc}}\mid$ is around 10. Therefore, the reward can be calculated by ${w}_{\text{pc}}{R}_{\text{pc}} + {w}_{\text{ds}}{R}_{\text{ds}} + {w}_{\text{rc}}{R}_{\text{rc}}$. The action will be taken for the purpose of maximizing the reward under the current state.
The motion control module in the DTN serves the function of controlling the movement of each involved CAV based on the path generated by the path planning module. It is responsible for controlling the acceleration, braking, and steering of the CAV, ensuring that the CAV maintains a safe distance from other vehicles, pedestrians, and obstacles on the planned path and provides as enjoyable an experience to the riders as it possibly can.
Whether driving straight, making a turn, or cutting in, it is essential for the CAV to take into account factors, such as real-time operation, driving performance, ride comfort, and ease of convergence. Usually, it is satisfactory for a rider to arrive at his intended destination safely and comfortably within an acceptable time period. Nevertheless, it can prove challenging to achieve the optimal outcomes that meet all of these conditions, thereby necessitating the development of an action space that can cater to the requirements of effective coordination.
We propose a multiagent graph attention network (GAT) to control CAVs’ diverse motion patterns affected by individual dynamics and their interactions. The framework of multiagent GAT is shown in Figure 5. The input contains the octree map outputted by global path planning and the estimated probabilities of maneuver types (SiCL, MtRL, and MtLL) outputted by local path planning. An encoder network is constructed using a combination of a CNN and an LSTM. The conditional CNN processes the octree map to capture spatial features, while the LSTM takes the estimated probabilities of maneuver types as input to extract temporal features between the current and the previous time slots. In the interaction model, consisting of a graph convolutional network and a GAT block with two GAT heads, one dropout, one batchNorm2d, is used to assign varying degrees of significance to neighboring agents through edge weights. The agents’ dynamics are derived from their past states using encoders tailored to their specific types. These interactions are then represented as a directed graph with featured edges, forming a heterogeneous structure. The constructed GAT network is responsible for processing this graph and extracting interaction features. Then, a graph convolutional network, consisting of a convolutional layer, a graph convolutional layer, a fully connected layer, and a temporal convolutional layer, is applied to allow sharing the feature map across all agents in a selective manner. Next, a spatial feature, temporal feature, and interaction feature are fed into a DQN with fuzzy logic. The DQN is a three-layer FC neural network, containing a finite quantity of hidden units to effectively estimate continuous functions in a uniform manner. Then the fuzzy outputs are dealt with by the maximum defuzzification method.
Figure 5 Framework of multiagent GAT.
For the purpose of guaranteeing safe driving in the first place and then delivering comfortable experience to the riders if possible, the reward in motion control module is defined as the satisfaction of riders, affected by three factors: safety, comfort, and efficiency. Safety Ruo is decided by the residual time to collision. Comfort Rjerk is decided by how often a quick and sudden movement happens. Efficiency Rud is decided by the remaining time to the destination. Therefore, the reward is calculated by: \[{R} = {w}_{\text{tto}}{R}_{\text{tto}} + {w}_{\text{jerk}}{R}_{\text{jerk}} + {w}_{\text{ttd}}{R}_{\text{ttd}}{.} \tag{2} \] where $\mid{w}_{\text{tto}}\mid$, $\mid{w}_{\text{jerk}}\mid$, and $\mid{w}_{\text{ttd}}\mid$ are the weights of safety, comfort, and efficiency, respectively, among which the former is much larger than the latter two. Rjerk is expressed by [9]: \[{R}_{\text{jerk}} = \frac{\left\vert{\left({{a}_{t}{-}{a}_{{t}{-}{1}}}\right){/}\Delta{t}}\right\vert}{{\varepsilon}_{\text{jerk},\text{accel}}} + \frac{\left\vert{\left({{b}_{t}{-}{b}_{{t}{-}{1}}}\right){/}\Delta{t}}\right\vert}{{\varepsilon}_{\text{j}\text{e}\text{r}\text{k},\text{w}\text{h}\text{e}\text{e}\text{l}}} \tag{3} \] where ${\varepsilon}_{\text{jerk},\text{accel}}$ and ${\varepsilon}_{\text{j}\text{e}\text{r}\text{k},\text{w}\text{h}\text{e}\text{e}\text{l}}$ are the maximum accepted acceleration and steering angle, respectively; ${\alpha}_{1}$ and ${\alpha}_{{t}{-}{1}}$ are accelerations at the current and the last time slots, respectively; ${b}_{1}$ and ${b}_{{t}{-}{1}}$ are the steering angles at the current and the last time slots, respectively; and $\Delta{t}$ is the time interval.
To evaluate our proposed scheme, we initialize the conditions (e.g., each CAV’s location) of the considered road through extracting a certain cutout scenario from a real traffic dataset [10]. Multiple RSUs are implemented along both sides of the road, providing full coverage to CAVs on the road. Each RSU is configured with the computing resource of 20 GHz. Each CAV, moving on the road at the speed of no more than 20 m/s, holds the computing resource ranging from 3 to 5 GHz. The maximum transmit power of each CAV is 20 dBm and that of each RSU is 23 dBm. With the noise power at −174 dBm/Hz, mmWave-based V2V communications comply with the light-of-sight path-loss model in WINNER+B1 [11] and V2I with the path-loss model of ${128}{.}{1} + {37}{.}{6}\log{10}{(}{d}{)}$, where d is end-to-end distance.
To evaluate the efficiency of a DT constructed by our proposed method, we consider, for example, that the DT replicates a road segment at the length of 100 m. Under different vehicle densities, we measured and compared the utilities of the DTs constructed by two fashions of learning as the baselines and the proposed one, as follows:
In Figure 6 it is seen that the utility of the DT decreases when the traffic load becomes heavy in each method. It is understandable that more CAVs lead to the more complicated segment conditions, which brings the heavier burden to learning under the limited communication and computing resources and consequently degrades the utility. However, compared with the two baselines, our proposed method contributes the highest utility, thanks to the adaptive splitting of the learning model according to heterogeneous computing capabilities and moving trends of CAVs. In contrast, the performance of single split is hindered by limiting to a single partition without considering the actual situations. Although the RSU has stronger power of processing data, more time is consumed for the collection of all of the sensory data from CAVs than that is saved by computation.
Figure 6 Utility of DT versus vehicle density.
To verify whether the proposed DTN system can ensure satisfactory experience for adverse riders, we assume three types of riders: i.e., conservative, moderate, and aggressive riders with different requirements on jerk. The maximum changes in acceleration for conservative, moderate, and aggressive riders are 2.9097 m/s3, 3.0164 m/s3, and 4.7594 m/s3 [13], respectively; and those in steering angle for them are ${\pi}{/}{12}{/}{\text{s}}$, ${\pi}{/}{6}{/}{\text{s}}$, and ${\pi}{/}{4}{/}{\text{s}}$, respectively.
As demonstrated in Figure 7, the average satisfaction of each type of rider decreases with more and more vehicles driving into this road segment, which is in line with the reality. However, it is encouraging to discover that the personalized service provided by our proposed system enables the riders to enjoy better experience than the nonpersonalized service. Certainly, every rider is happy with the road with fewer vehicles, while aggressive riders are more easily irritated by heavier traffic.
Figure 7 Rider satisfaction versus vehicle density for different types of riders.
To demonstrate the role of DTN in the CAV system, the performance of our proposed scheme is compared to those of two baselines, joint learning with the attention layer and the CNN layer (JAC) [4] and an intelligent driver model with lane changing (IDM-LC2013) [14]. JAC employs a reinforcement learning model to jointly optimize sensing, decision making, and motion-controlling in autonomous driving, and IDM-LC2013 combines the car-following model with the lane-changing model.
Figure 8 shows the average satisfaction of riders in the proposed scheme, the JAC, and the IDM-LC2013 schemes. The average satisfaction of riders is denoted as the ratio of the sum of riders’ satisfaction to the number of riders. The average satisfaction in all of the schemes decreases with more and more CAVs driving into this region, aligning with real-world observations. The average satisfaction of the JAC scheme is always larger than that of the IDM-LC2013 scheme, due to both comprehensive decision making by cooperatively optimizing lane changing and speed and strong adaptation to dynamic and complex traffic scenarios. It can be also seen that the proposed scheme outperforms the JAC scheme thanks to the effective utilization of DTN. On the one hand, the DTN can timely and precisely understand the practical environment and provide the instructions on autonomous driving; on the other hand, the DTN provides a multi-AI–empowered platform, where seamless coordination of environmental perception, path planning, and motion control optimizes the behavior of each CAV in terms of guaranteeing safety, satisfying riders, and enhancing efficiency.
Figure 8 Rider satisfaction versus vehicle density in different schemes.
Autonomous driving assisted by DTs is considered as a promising enabler for intelligent transportation. This article presented a study on how the DTN consisting of multiple connected DTs operates to serve the riders on the CAVs as well as possible. Three modules—environmental perception, path planning, and motion control—are embedded in the DTN. For the purpose of copying and evolving with actual road conditions, the environmental perception module is designed based on a DMTL-based perception model. The path planning module, committed to enhancing traffic management and sensitivity to sudden changes, employs a global path planning model and a local path planning model. The motion control module is designed to control each CAV to drive along with the adaptively planned path while providing safe and personalized service. The results from numerical evaluations demonstrate the operation and the seamless coordination of the three modules not only guarantee safe and smooth driving but also really understand riders and satisfy their needs.
Qingyang Song is the corresponding author for this article.
Ya Kang (kangya.cqupt@foxmail.com) is pursuing a Ph.D. degree in information and communication engineering from the Chongqing University of Posts and Telecommunications, Chongqing 400065, China. She received the M.S. degree in information and communication engineering from the Chongqing University of Posts and Telecommunications, Chongqing, China, in 2021. Her current research interests include connected and autonomous vehicles systems, mobile edge computing, and resource management in wireless networks.
Qingyang Song (songqy@cqupt.edu.cn) is a full professor of mobile communications with Chongqing University of Post and Telecommunications, Chongqing 400065, China. She received the Ph.D. degree in telecommunications engineering from The University of Sydney, Australia, in 2007. Her research interests include cooperative resource management, edge computing, vehicular network, and symbiotic radio. She is on the editorial boards of two journals, including IEEE Transactions on Vehicular Technology and Digital Communications and Networks.
Jing Song (songjingchn@foxmail.com) is a lecturer with the School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China. She received the Ph.D. degree in communication and information systems from Northeastern University, Shenyang, China, in 2023. Her research interests include connected and autonomous vehicles systems, mobile edge computing, resource allocation, and machine learning in wireless network.
Fengsheng Pan (s220101113@stu.cqupt.edu.cn) is currently working toward the M.S. degree in information and communication engineering at Chongqing University of Posts and Telecommunications, Chongqing 400065, China. He received the B.E. degree in information and communication engineering, from Chongqing University of Posts and Telecommunications, Chongqing, China in 2022. His research interests include mobile edge computing, resource allocation, and automated driving.
Lei Guo (guolei@cqupt.edu.cn) is a full professor of communication and information system with the Chongqing University of Posts and Telecommunications, Chongqing 400065, China. He received the Ph.D. degree in communication and information system from the University of Electronic Science and Technology of China in 2006. His research interests include communication networks, optical communications, and wireless communications.
Abbas Jamalipour (a.jamalipour@ieee.org) is a professor of ubiquitous mobile networking at the University of Sydney, Sydney, NSW 2050, Australia. He received the Ph.D. degree in electrical engineering from Nagoya University. He is editor-in-chief of the IEEE Transactions on Vehicular Technology and a Fellow of IEEE.
[1] R. Whiton, “Cellular localization for autonomous driving: A function pull approach to safety-critical wireless localization,” IEEE Veh. Technol. Mag., vol. 17, no. 4, pp. 28–37, Dec. 2022, doi: 10.1109/MVT.2022.3208392.
[2] O. Natan and J. Miura, “Towards compact autonomous driving perception with balanced learning and multi-sensor fusion,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 9, pp. 16,249–16,266, Sep. 2022, doi: 10.1109/TITS.2022.3149370.
[3] C. Chen et al., “Cooperation method of connected and automated vehicles at unsignalized intersections: Lane changing and arrival scheduling,” IEEE Trans. Veh. Technol., vol. 71, no. 11, pp. 11,351–11,366, Nov. 2022, doi: 10.1109/TVT.2022.3193096.
[4] L. Chen, Y. He, Q. Wang, W. Pan, and Z. Ming, “Joint optimization of sensing, decision-making and motion-controlling for autonomous vehicles: A deep reinforcement learning approach,” IEEE Trans. Veh. Technol., vol. 71, no. 5, pp. 4642–4654, May 2022, doi: 10.1109/TVT.2022.3150793.
[5] F. Tao, H. Zhang, A. Liu, and A. Y. C. Nee, “Digital twin in industry: State-of-the-art,” IEEE Trans. Ind. Informat., vol. 15, no. 4, pp. 2405–2415, Apr. 2019, doi: 10.1109/TII.2018.2873186.
[6] B. Fan, Z. Su, Y. Chen, Y. Wu, C. Xu, and T. Q. S. Quek, “Ubiquitous control over heterogeneous vehicles: A digital twin empowered edge AI approach,” IEEE Wireless Commun., vol. 30, no. 1, pp. 166–173, Feb. 2023, doi: 10.1109/MWC.012.2100587.
[7] B. Fan, Y. Wu, Z. He, Y. Chen, T. Q. S. Quek, and C.-Z. Xu, “Digital twin empowered mobile edge computing for intelligent vehicular lane-changing,” IEEE Netw., vol. 35, no. 6, pp. 194–201, Nov./Dec. 2021, doi: 10.1109/MNET.201.2000768.
[8] C. Schwarz and Z. Wang, “The role of digital twins in connected and automated vehicles,” IEEE Intell. Transp. Syst. Mag., vol. 14, no. 6, pp. 41–51, Nov./Dec. 2022, doi: 10.1109/MITS.2021.3129524.
[9] S. Hwang, K. Lee, H. Jeon, and D. Kum, “Autonomous vehicle cut-in algorithm for lane-merging scenarios via policy-based reinforcement learning nested within finite-state machine,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 10, pp. 17594–17606, Oct. 2022, doi: 10.1109/TITS.2022.3153848.
[10] M. Piorkowski, N. Sarafijanovic-Djukic, and M. Grossglauser, “CRAWDAD EPFL/Mobility,” IEEE, Piscataway, NJ, USA, 2008. [Online] . Available: https://ieee-dataport.org/open-access/crawdad-epflmobility
[11] P. Kyösti et al.,“WINNER II channel models,” Information Society Technologies (IST), Luxembourg, Tech. Rep. IST-4-027756 WINNER II D1.1.2 V1.2, 2008.
[12] S. Wang, X. Zhang, H. Uchiyama, and H. Matsuda, “HiveMind: Towards cellular native machine learning model splitting,” IEEE J. Sel. Areas Commun., vol. 40, no. 2, pp. 626–640, Feb. 2022, doi: 10.1109/JSAC.2021.3118403.
[13] B. Gao, K. Cai, T. Qu, Y. Hu, and H. Chen, “Personalized adaptive cruise control based on online driving style recognition technology and model predictive control,” IEEE Trans. Veh. Technol., vol. 69, no. 11, pp. 12,482–12,496, Nov. 2020, doi: 10.1109/TVT.2020.3020335.
[14] J. Erdmann, “Sumo’s lane-changing model,” in Modeling Mobility With Open Data, M. Behrisch and M. Weber, Eds., Berlin, Germany: Springer-Verlag, 2015, pp. 105–123.
Digital Object Identifier 10.1109/MVT.2023.3328107