Digital Twin and Artificial Intelligence-Empowered Panoramic Video Streaming

Digital Twin and Artificial Intelligence-Empowered Panoramic Video Streaming: Reducing Transmission Latency in the Extended Reality-Assisted Vehicular MetaverseSiyuan Li, Xi Lin, Jun Wu, Wei Zhang, Jianhua Li18mvt04-li-opener-3321172©SHUTTERSTOCK.COM/METAMORWORKSThe vehicular metaverse is expected to provide a widely connected virtual Internet of Vehicles (IoV), where extended reality (XR) is one of the critical infrastructures. However, the combination of XR and automated vehicle (AV) networks brings several significant challenges, e.g., low-latency XR panoramic video transmission, high bandwidth, and the high mobility of vehicles. This article introduces digital twin (DT) and artificial intelligence (AI)-empowered panoramic video streaming for XR-assisted connected AVs to reduce transmission latency and intelligently respond to user requirements. Specifically, we propose a DT-enabled distributed XR service management framework to provide low-latency and smooth XR services across different domains in the vehicular metaverse. In addition, we present a case study on XR streaming-based virtualized resource allocation and propose a novel deep reinforcement learning (DRL)-based method to minimize transmission latency. Quantitative experimental results demonstrate that the positive role of AI in connected AV networks can be enhanced by DTs. Finally, open issues and potential research directions for the XR-assisted vehicular metaverse are discussed.IntroductionThe metaverse refers to a virtual world constructed by computers, which possesses a self-contained system connected to the physical world [1]. It has captured the interest of researchers due to its potential to revolutionize the way we interact with one another. Multimodal technologies are integrated into the metaverse [2], where users explore virtual environments using advanced technologies like augmented reality and virtual reality. XR enables users to experience real-time changes in the physical world and seamlessly connect with others in the metaverse [3]. Emerging XR applications rely on mobile networks, producing unprecedented user experience requirements.As a widely connected virtual Internet of Things (IoT) world, the metaverse incorporates large-scale XR devices. The combination of XR devices with AV networks and other intelligent IoT systems poses challenges in efficiently managing massive data transmission and computing loads. The proliferation of XR devices has led to an increased demand for mobile XR users to interact with servers and access real-time information within the virtual world. Some XR application scenarios have high requirements for mobility, such as AVs. Although AVs are equipped with high-quality panoramic cameras and lidar, they cannot perceive the entire environment, due to occlusions [4]. Connected AVs have attracted considerable attention due to their revolutionary potential in traffic safety [6].Recent research has explored the application of XR in the IoV; however, several challenges are still not fully addressed. Video streaming-based communication between users and roadside units (RSUs) in connected AVs is still a challenging task. Second, considering that the required data from XR users are in the form of continuous video streaming, the transmission latency will significantly increase as the number of users increases. Some studies have designed innovative solutions to these challenges in XR video streaming by exploring XR video caching and delivery [7].To ensure XR user experiences, the end-to-end transmission latency should be low enough, and the bandwidth should be large enough. Video streaming in the XR-assisted metaverse requires much more bandwidth than 4K videos. Therefore, receiving continuous XR video streaming data under mobility in real time is a challenging task, and the existing centralized framework cannot fully meet the requirements. It is necessary to study a distributed XR service management framework based on edge networks and consider the requirements of mobility. In addition, due to the demand for highly complex computation, XR video cannot be effectively processed on AVs with limited resources. Therefore, the efficient allocation of resources becomes an urgent problem in many fields of the metaverse. In this context, the DT, one of the foundations of the metaverse, is introduced as a solution. DTs can create digital models that accurately reflect physical objects, thereby facilitating the optimization of the transmission latency in the dynamic and mobile vehicular environment [5]. Current research has attempted to combine DTs with mobile edge computing [8] or widely distributed IoT entities [9]. However, few works consider introducing DTs into the XR-assisted vehicular metaverse.In this article, we consider the IoV in urban areas, where connected XR-assisted AVs request XR graphic data from RSUs. Then, considering the real-time mapping of DTs to physical vehicle states, we introduce DT and AI-empowered video streaming for XR-assisted AVs to optimize the transmission latency under high mobility. Specifically, we introduce a DT-enabled distributed XR service management framework to provide smooth XR services across different domains. In addition, we design a DRL-empowered approach providing virtualized resources for XR users while minimizing latency. Finally, we discuss open issues and potential research directions for the XR-assisted vehicular metaverse.DT and AI-Empowered XR Video Streaming for Connected AVsXR services are widely used in many fields, such as virtual entertainment. One of the cases is remotely connecting XR users in different locations to the same virtual scene to form a hologram, such as collaborative 3D games, remote conferences, and remote concerts [1]. As demonstrated in Figure 1, the scenario we consider is that XR users traveling by AVs request XR video, such as a real-time scene of the virtual world, from a set of RSUs. XR users may move randomly within a certain area, and the distance from the other users and RSUs also affects the data transmission rate. This case has latency and bandwidth requirements to ensure good interaction among users.li01-3321172Figure 1 DT and AI-empowered video streaming for XR-assisted AVs.In this article, we introduce DT and AI-empowered video streaming focusing on the XR-based communication between XR users and RSUs, as shown in Figure 1. The workflow is summarized as follows. First, a certain number of XR users traveling in AVs from different locations connect to the same virtual scene. Next, at the request of the XR users, XR graphics data are transmitted in the form of video streaming. Then, each user is required to continuously update their driving information with the corresponding DT. Therefore, if the RSUs need to collect the latest driving information from the AVs, the RSUs can directly communicate with corresponding DTs to get the information. Similarly, the AVs are capable of receiving traffic information from the RSUs and exchanging driving information with the other AVs in this way. Finally, after receiving real-time data from DTs, the RSUs can execute relevant algorithms for network optimization with the help of an XR service management device [10].The integration of DTs and AI holds significant potential to enhance the effectiveness of the XR-assisted vehicular metaverse. As a case, we consider the context of resource allocation. Each RSU possesses XR graphic data, computational capabilities, and storage resources, and it acquires supplementary graphic data from other RSUs through edge networks. Through the virtualization and digitization process by DTs, AI-based resource allocation algorithms can learn from the data and derive optimal responsive solutions tailored to prevailing user requests. This integration of DTs and AI empowers a more efficient reduction of the transmission latency, thereby elevating the overall performance of the system. Unlike traditional data transmission, this article considers XR graphics data in the form of video streaming.The presented novel XR video streaming paradigm for connected AVs can ensure the scalability and reliability of the network. First, within the edge computing-based network architecture, the expansion of the network scale leads to a proportional increase in the equipped edge RSUs, strategically positioned near AVs. Second, the DT edge network (DTEN) layer manages the raw data from the RSUs, leveraging DTs for real-time status updates and ensuring operational scalability and reliability. Furthermore, the presented DRL-based resource allocation algorithm is capable of handling a large action space and efficiently processing a substantial volume of XR service requests and RSU resource indexes. Therefore, it optimizes the global transmission latency and ensures the scalability of the framework.DT-Enabled XR Service Management FrameworkXR video presents a more engaging experience to users, as it enables active participation within the visual environment. However, the distinctive attributes of XR content impose higher demands on more advanced mobile networks. Particularly, due to the dynamic nature of panoramic XR video, its bit rate, approximately reaching 400 Mb/s, is nearly tenfold higher than that of conventional 2D video, resulting in a considerable increase in transmission latency [7]. Ensuring ultralow latency becomes imperative to mitigate user discomfort, thereby presenting a significant challenge for the seamless transmission of XR video content.The current centralized network architecture proves insufficient in meeting the mobility and low-latency requirements essential for the rapid response of XR services and the efficient transmission of XR graphics data. In contrast, an edge computing-based framework integrated with RSUs offers a more responsive solution. Additionally, the proximity of edge computing-based architecture significantly reduces data processing times and minimizes latency. Leveraging edge caching emerges as a viable approach to address this issue, wherein XR video content is cached in RSUs situated near end users. However, compared with traditional video caching, some problems need further exploration. The most critical challenge lies in determining what content should be cached. It is neither practical nor necessary to cache an entire XR video since the content viewed through an XR device usually accounts for only a small part of an XR video frame (typically no more than 20%). Each panoramic video frame is typically projected and utilizes adaptive transmission based on the field of view (FOV) [11].The combination of DTs and XR-assisted AV networks brings at least three benefits:

DTs enable the AV network to perform real-time modeling and monitoring of the states of AVs and channels, including network latency, bandwidth changes, and other relevant metrics, to provide comprehensive and accurate network analysis.

DTs can establish a virtual twin network layer above the physical AV network layer, allowing for timely user feedback through cross-layer interaction.

Through continuous real-time monitoring, the AV network gains the ability to detect fluctuations in network conditions and respond promptly to such changes.

In this article, we propose a DT-enabled XR service management framework to provide low-latency transmission and smooth XR services across different domains. As exhibited in Figure 2, the proposed framework consists of two layers: the DT-enabled AV network layer and the XR service management layer. The DT-enabled AV network layer consists of a physical AV network and DTEN. Specifically, the two layers are as follows.li02-3321172Figure 2 The proposed DT-enabled distributed XR service management framework combined with NFV.DT-Enabled AV Network LayerPhysical AV NetworkThe physical AV network consists of AVs and RSUs that gather substantial amounts of raw data. These data can be processed in the DTEN layer to provide real-time traffic updates and alleviate traffic congestion. To ensure the quality of the wireless communication, strategic placement of RSUs in proximity to AVs is undertaken. Each RSU is equipped with an integrated edge server, thereby equipping the RSUs with computational and storage capabilities to support data processing. By analyzing resource usage and wireless channels, RSUs can anticipate the availability of computational resources as well as the current state of wireless channels.DTENThe physical AV network and XR service management layer are linked by the DT network layer, which is composed of data storage, digital model mapping, and DT management. The data storage module collects physical network states, while the digital model mapping module extracts features based on these real-time state data. The DT management part manages and updates DT items in the XR service management layer, including model updating and state synchronization [12].XR Service Management LayerTo facilitate the effective harnessing of heterogeneous network resources for XR services, the proposed framework incorporates network function virtualization (NFV) to aggregate and manage XR services across different domains. Specifically, it contains NFVs that are composed of different XR services. The XR service management layer assumes responsibility for the monitoring and effective administration of XR services, including tracking resource usage, collecting and processing monitoring data, and negotiating with RSUs in the physical AV network for resource allocation dedicated to XR services. Additionally, it manages computational resources and storage space and the interconnection of diverse NFVs, thus facilitating the seamless application of XR services across disparate domains. Enabling a consolidated perspective on resources, the virtualization layer plays a crucial role in unifying the visualization and deployment of NFVs within the framework [13].System Model and Problem StatementResource Allocation Problem StatementIn this section, as an illustrative case study, we take the resource allocation task within the XR-assisted AV network to demonstrate the effectiveness of our proposed framework. As previously mentioned, the network consists of RSUs containing computational resources and mobile AVs continuously requesting XR service. We simulate user mobility by allowing XR users to move within a defined range at a consistent velocity. However, since there may be several XR service requests simultaneously, it is necessary to efficiently allocate limited resources. Therefore, the objective of our defined problem is to leverage a DRL-based resource allocation algorithm to optimize the global transmission latency to meet the requirements of all users as much as possible.DT-Based Communication ModelNetwork ModelAt each discrete time step, XR users send requests to RSUs for XR graphics data, with the requirement that the transmission task must be completed within a predefined temporal threshold. Each XR user is strategically allocated to at least one RSU, while each RSU manages its channels with nearby XR users. The RSUs can process requests collaboratively in real time, and the occupancy of certain RSUs may be sparse in instances of limited XR service demands. AVs are treated as local computational resources with velocity and location.Through continuous mapping, DTs remain up to date with the states of the AVs. The DTs also help prevent conflicts from too many users interacting with the RSUs simultaneously. As illustrated in Figure 3, we assume that DTs and servers are located on the RSUs, so the transmission latency between DTs and RSUs is ignored due to their direct and high-speed hardware connection.li03-3321172Figure 3 The workflow of the proposed virtualized resource allocation approach for XR-assisted AVs.Local CaseTo satisfy user requirements, the RSUs need to intelligently provide resources for the AVs. Each computational task is described by the data size, required computational resources, completion deadline, and output result data size. While simple tasks can be executed locally on the AVs, most of the computationally intensive tasks must be offloaded to the RSUs. Both scenarios are possible, but we prioritize tasks that require substantial computational resources. For tasks executed locally, the computational latency is determined by the data size, required resources, and CPU cycle frequency of the AVs. If the task is required to be offloaded to an RSU, there are two possible cases.Offloading Case 1: Completed Within the Coverage Area of One RSUIn scenarios where a task is completed within the coverage area of a single RSU, the task execution time encompasses offloading the task from an AV to the nearest RSU, subsequently computing it on the edge, and sending the computational results back to the AV. The computational latency depends on the allocated computational resources and data size. In addition to specific computational tasks, the overall latency is affected by the uplink and downlink communication rate and RSU’s computational resources allocated to the AV. The communication rate of the AV is determined by the transmission bandwidth, background noise, vehicle transmission power, and channel gain.Offloading Case 2: Completed Out of the Coverage Area of One RSUIf the task cannot be completed within the coverage area of a single RSU, there is an extra transmission process among RSUs. As the AV moves, computational results are transmitted from one RSU to another until the task is completed. Given the coverage area of each RSU and the backhaul rate between two RSUs, the transmission latency between two RSUs can be calculated. When the AV enters the coverage area of the final RSU, it obtains the computational results through vehicle-to-RSU communication. Therefore, the task execution time can be expressed as the sum of the uplink communication latency, computational latency, communication latency between RSUs, and downlink communication latency. A weight coefficient is used to indicate whether the task can be completed within the coverage area of one RSU. When the AV chooses to offload the task to an RSU, the task execution time can be expressed as a weighted sum of the execution time of the above two cases. The task execution time can be expressed as a combination of the local and offloading execution time, with a parameter representing the proportion of local execution. The problem is formulated to minimize the total time taken by all the AVs to complete their tasks.XR Video Streaming ModelAll the XR virtual world content is prestored on the RSU server, which is crucial for smooth transmission. To ensure a seamless experience, the panoramic XR video content is first projected onto a 2D plane and then encoded into a group of pictures (GOP). This technique is used to break down the video content into smaller pieces, which can be more efficiently transmitted over the network. To further optimize the data transfer, each GOP is divided into tiles of equal size and coded into different versions following the Dynamic Adaptive Streaming over HTTP standard [7]. This approach ensures that users with different types of devices can access the same content, tailored to their device capabilities and network conditions. Several tiles can cover the entire FOV of the XR user, which provides a more immersive experience.To improve storage space utilization, RSUs are encouraged to cache useful tiles, forming a virtual viewport. This technique reduces repeated data transmission and improves the overall efficiency of the AV network. Furthermore, the projection range of the FOV can dynamically adjust based on the number of connected AVs in the network, prioritizing low-latency transmission and uninterrupted XR services. Video processing program FFmpeg, a widely used tool in the industry, is used for encoding and decoding videos.XR Sevice ManagementThe distributed XR service management framework decomposes each XR service request into the corresponding domain and checks available resources. The framework leverages DTs for real-time monitoring, including changes in resource availability. If the required resources are unavailable in a particular domain, the framework dynamically intelligently redirects the request to alternative domains, ensuring that users access the desired XR services even in resource-limited fields. We introduce NFV to allow the separation of hardware and virtual resources. As shown in Figure 3, resources are aggregated and virtualized in resource pools and then intelligently scheduled to optimize the solution. To simulate service requests, the arrival process is modeled to include the virtual resource requirements, maximum usage time, and required specific resource type. It is also necessary to ensure that the total task execution time meets the strict deadline and that the allocated computational resources do not exceed the RSU’s total resource capacity.Reinforcement Learning for XR Service Resource AllocationThe problem described in the previous sections is typically a nonconvex programming problem with inherent evolutionary uncertainty. Unlike conventional resource scheduling tasks, this problem necessitates sequential decision making and involves uncertainties that evolve over time. Reinforcement learning is a suitable technique for addressing this problem [14]. Hence, we propose a highly efficient virtualized resource allocation algorithm based on an improved DRL algorithm to efficiently address the complex problem.Markov Decision Process for XR Service Resource AllocationTo apply the DRL-based algorithm, the problem must be formulated as a Markov decision process (MDP) model with a well-designed state space, action space, and reward function, as detailed in Figure 3 and described in the following:

State: To simplify the state space, we set the state space to mainly include the state of the resources on the RSUs and the state of the XR service requirements. The state of the requested RSU resources includes the resource location, resource availability, and number of resources on an RSU.

Action: In our work, the output action is the allocation relationship between XR service requests and RSU resources. To obtain this result, we employ neural networks to fit the state–action value function. With the increasing scale of the network, the large action space poses difficulties in effectively applying traditional DRL methods. Proximal policy optimization (PPO) addresses the balance between computational efficiency and methodological rigor for a large action space and large-scale policy updates. Our proposed resource allocation algorithm is based on the distributed PPO (DPPO) algorithm, which utilizes parallel computing for independent thread interactions, data gathering, and gradient calculation.

Reward: The reward function is a critical component of the MDP model, significantly influencing the model’s convergence. In our work, we have meticulously designed a reward function including four components: 1) penalty for communication and computational latency, 2) penalty for resource waste according to the discrepancy between allocated and required virtualized RSU resources, 3) penalty for unused cached XR videos, and 4) reward for successful completion of service requests, based on server bandwidth, RSU resources, and request duration.

DRL-Based Virtualized Resource Allocation ApproachWithin the framework of DT-enabled XR service management, we propose a novel DRL-based virtualized resource allocation approach, DeepDXV, for XR-assisted AV networks. Built upon the DPPO algorithm, DeepDXV incorporates DTs to efficiently provision virtualized resources on RSUs based on dynamic XR service requests. The resource allocation problem is tackled through a two-step process: determining which RSU resources should be virtualized by the associated NFV and subsequently performing the resource virtualization accordingly. To prevent conflicts, it is ensured that a particular RSU resource is not virtualized simultaneously by different NFVs.In the tasks considered in this article, the service response process can be roughly divided into two phases, as represented in Figure 4. The adaptive nature of DeepDXV allows the approach to dynamically adjust to changes in network conditions during the training process, enabling the learning of resource allocation policies better suited for real-time XR service requests. In the initial stage, DeepDXV gradually learns the optimal resource allocation strategy when XR service requests increase. In the stable stage, DeepDXV has learned the optimal policy to stably allocate RSU resources until the generation of service requests stops. Consequently, our approach can effectively accommodate fluctuations in network conditions and potential interruptions, ensuring the delivery of low-latency and smooth XR services in the vehicular metaverse. Before the inference process, the agents analyze service request characteristics and select the best-performing agent.li04-3321172Figure 4 The training reward of the proposed algorithm under different experimental settings.Experimental EvaluationIn this section, we first discuss our environment setup and provide some significant hyperparameters used in the experiments. Then, experimental results are shown and analyzed in detail.Experimental SetupThe DRL-based algorithm, DeepDXV, is trained using an Nvidia GeForce RTX 3090Ti GPU with 24 GB of random-access memory. The considered test environment is a connected AV network with an area of 500 × 500. The distance between AVs and RSUs is uniformly distributed based on their quantity. The data size required for each task is randomly sampled from 5 to 10 MB, and the number of required CPU cycles per data unit follows a uniform distribution, ranging from five to 10 cycles per bit. Additionally, the arrival rate of requests conforms to a Poisson distribution. These requests are directed to the nearest RSU for execution. The transmit power of the AV ranges from 1 to 2 mW, while the transmit power of the RSU is randomly sampled from 5 to 10 mW. The additive white Gaussian noise of the communication channel follows a standard normal distribution. Both the transmit power and noise power vary at each time step to simulate real-world latency fluctuations. As for content, the XR video library requested by XR users is sourced from Insta360 Pro2 and consists of 20 videos. Each video is time limited to 30 s and converted for compatibility with Unity Engine 2020.3.22f1 by using FFmpeg. The XR video frames have a resolution of 8K (7,680 × 3,840) and are divided into 15 GOPs, where each GOP is set to 2 s and contains 60 frames. It has been observed that only 12%–20% of the frame area is visible [15]. Therefore, in our approach, we have constrained the content range to 20% of the entire frame. In the video processing module, each GOP is divided into 16 tiles to encode, tile, and segment using FFmpeg. The encoded video content is transmitted using the HTTP protocol while adhering to a predefined maximum allowable transmission latency for service requests. Given the dynamic nature of the network, the resource allocation for XR service requests is subject to fluctuations.Experimental ResultsWe compare the performance of our proposed algorithm with two classical resource management algorithms and an advanced DRL-based algorithm, PPO. The greedy-based resource management algorithm searches for the closest available resource to the minimum service request requirement. Due to the strict search strategy, it usually wastes fewer resources. The order-based algorithm performs sequential lookups for a free resource with sufficient capacity. Additionally, an advanced DRL-based algorithm, PPO, is implemented to show the advantages of the proposed algorithm.We trained the proposed model for 1,000 episodes and evaluated its performance with varying numbers of AVs. Figure 4 displays variations in the training reward of the proposed DeepDXV method with different numbers of AVs. The scale of the resources within the initial XR network is fixed, whereas the resource demands of each service request exhibit dynamism. The experimental results demonstrate that as the number of AVs increases, the stable state also consistently attains a higher reward, which shows that the proposed DeepDXV method can adapt to different scales of the network and achieve eventual convergence.To demonstrate the performance and adaptability advantages of the proposed DeepDXV method, we conduct comprehensive evaluations encompassing service throughput and average transmission latency. As evident in Figures 5 and 6, DeepDXV consistently achieves maximum throughput while maintaining the lowest transmission latency and is capable of processing an increased volume of service requests across varying AV quantities. Notably, the proposed approach demonstrates sustained stability with the increasing number of AVs and service requests, which indicates its suitability for scenarios with a high service arrival rate, facilitated by the synergistic integration of DT and AI techniques.li05-3321172Figure 5 The normalized average total throughput with different numbers of AVs.li06-3321172Figure 6 The transmission latency with different numbers of AVs.Open Issues and Potential Research DirectionsVehicle Control for Connected AVsSafe and efficient vehicle control is one of the most significant technologies for connected AVs. The main function of the vehicle control part for an AV is to optimize the trajectory of the vehicle. Currently, there are two research directions. One is to obtain reliable vehicle state information through networks, while the other is the intelligent perception of individual vehicles, such as predicting the trajectory of pedestrians and other vehicles. In addition, due to the difficulty in directly measuring the longitudinal and lateral speeds, rotation angles, and other crucial parameters of vehicles, the development of efficient and accurate estimation of these parameters is an urgent problem to be solved.Security and Privacy in the Vehicular MetaverseAlthough the vehicle metaverse has promising prospects, its security and privacy issues may hinder its further development. The potential security vulnerabilities in the vehicle metaverse include the personal data security of AV users, the unfairness of AI algorithms, and the security of infrastructure, such as RSUs. They can be mainly divided into two aspects. The latest technologies integrated into the vehicle metaverse have vulnerabilities and deficiencies, and there are some emerging security vulnerabilities in the metaverse, such as intrusions into XR devices, illegal acquisition of virtual currency, and so on [1]. Existing threats may be amplified in the metaverse due to the interweaving of various technologies.Efficient XR Panoramic Video ProjectionXR panoramic video is a 3D interactive video with a 360° viewing angle, and the process of converting it into a 2D flat format requires a lot of computation. When a panoramic video player displays panoramic video, the traditional method of converting the panorama format to the 2D projection format is generally based on the projection relationship of the images, and the projection calculation is carried out pixel by pixel. XR panoramic video projection technology directly supports the transmission of XR services, so a more efficient and reliable processing technology is urgently needed.ConclusionIn this article, we focused on XR video transmission in the connected AVs scenario, where XR users request XR service from RSUs. To mitigate the XR video transmission latency in the connected AVs environment, we introduced DT and AI-empowered video streaming for XR-assisted AVs in the vehicular metaverse. In addition, a DT-enabled distributed XR service management framework was introduced to provide smooth XR services across different domains in the vehicular metaverse. Then, we designed a DRL-based virtualized approach to optimize the process of resource allocation for XR users. Specifically, we formulated an MDP model that captures the dynamic network state transitions resulting from the random arrival of XR service requests. Basic network resources were virtualized and adjusted to meet changing requirements. The experimental results demonstrated that the proposed method converges rapidly under different scales of RSUs and XR service requests and outperforms several baseline methods. Finally, open issues for the XR-assisted vehicular metaverse, including connected AVs, security and privacy in the vehicular metaverse, efficient XR panoramic video projection, and multiple-bit-rate XR video transcoding, were discussed.AcknowledgmentThis work was supported, in part, by the National Natural Science Foundation of China (NSFC 62202302, U21B2019, and U20B2048), the Action Plan of Science and Technology Innovation of the Science and Technology Commission of Shanghai (22511101202), and Shanghai Key Laboratory of Integrated Administration Technologies for Information Security. Xi Lin is the corresponding author.Author Informationli_a1-3321172 Siyuan Li (siyuanli@sjtu.edu.cn) is currently pursuing his M.S. degree at the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China, where received his B.S. degree in 2022. His research interests include intelligent networks, data privacy, and machine learning.li_a2-3321172 Xi Lin (linxi234@sjtu.edu.cn) is currently an assistant professor at the School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China, where he received his Ph.D. degree in cybersecurity in 2021. His research interests include blockchain technology, multiaccess edge computing, and the Internet of Things. He is a Member of IEEE.li_a3-3321172 Jun Wu (junwu@aoni.waseda.jp) is currently a professor at the Graduate School of Information, Production, and Systems, Waseda University, Fukuoka 808-0135, Japan, where he received his Ph.D. degree in information and telecommunication studies in 2011. He was a researcher at the Global Information and Telecommunication Institute, Waseda University, from 2011 to 2013. He is a Senior Member of IEEE.li_a4-3321172 Wei Zhang (uny@0-1universe.com) is currently a vice president of Unity Zero, Shanghai, China 201802, responsible for the collaborative innovation and industrial transformation of industry–university research institutes. He received his MBA degree from Fudan University, Shanghai. He is a former head of the international network security company Codenomicon and the Synopsys Software Integrity Group, serving users such as Huawei, ZTE, Hikvision, State Grid, and the China Academy of Information and Communications Technology.li_a5-3321172 Jianhua Li (lijh888@sjtu.edu.cn) is currently a professor at and the dean of the School of Cyber Security, Shanghai Jiao Tong University, Shanghai 200240, China, where he received his Ph.D. degree in 1998. His research interests include information security, signal processing, and computer network communication. He is a Senior Member of IEEE and the vice president of the Cyber Security Association of China.References[1] Y. Wang et al., “A survey on metaverse: Fundamentals, security, and privacy,” IEEE Commun. Surveys Tuts., vol. 25, no. 1, pp. 319–352, First Quart. 2023, doi: 10.1109/COMST.2022.3202047.[2] T. Braud, L.-H. Lee, A. Alhilal, C. B. Fernández, and P. Hui, “DiOS—An extended reality operating system for the metaverse,” IEEE MultiMedia, vol. 30, no. 2, pp. 70–80, Apr./Jun. 2023, doi: 10.1109/MMUL.2022.3211351.[3] J. Cao, K. Y. Lam, L.-H. Lee, X. Liu, P. Hui, and X. Su, “Mobile augmented reality: User interfaces, frameworks, and intelligence,” ACM Comput. Surv., vol. 55, no. 9, pp. 1–36, Jan. 2023, doi: 10.1145/3557999.[4] Z. Xiao, J. Shu, H. Jiang, G. Min, H. Chen, and Z. Han, “Perception task offloading with collaborative computation for autonomous driving,” IEEE J. Sel. Areas Commun., vol. 41, no. 2, pp. 457–473, Feb. 2023, doi: 10.1109/JSAC.2022.3227027.[5] Z. Lv, J. Guo, A. K. Singh, and H. Lv, “Digital twins based VR simulation for accident prevention of intelligent vehicle,” IEEE Trans. Veh. Technol., vol. 71, no. 4, pp. 3414–3428, Apr. 2022, doi: 10.1109/TVT.2022.3152597.[6] P. Zhou et al., “AICP: Augmented informative cooperative perception,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 11, pp. 22,505–22,518, Nov. 2022, doi: 10.1109/TITS.2022.3155175.[7] H. Xiao et al., “A transcoding-enabled 360° VR video caching and delivery framework for edge-enhanced next-generation wireless networks,” IEEE J. Sel. Areas Commun., vol. 40, no. 5, pp. 1615–1631, May 2022, doi: 10.1109/JSAC.2022.3145813.[8] X. Lin, J. Wu, J. Li, W. Yang, and M. Guizani, “Stochastic digital-twin service demand with edge response: An incentive-based congestion control approach,” IEEE Trans. Mobile Comput., vol. 22, no. 4, pp. 2402–2416, Apr. 2023, doi: 10.1109/TMC.2021.3122013.[9] S. Li, X. Lin, J. Wu, A. K. Bashir, and R. Nawaz, “When digital twin meets deep reinforcement learning in multi-UAV path planning,” in Proc. 5th Int. ACM Mobicom Workshop Drone Assisted Wireless Commun. 5G Beyond, 2022, pp. 61–66, doi: 10.1145/3555661.3560865.[10] Z. Lv, L. Qiao, and R. Nowak, “Energy-efficient resource allocation of wireless energy transfer for the internet of everything in digital twins,” IEEE Commun. Mag., vol. 60, no. 8, pp. 68–73, Aug. 2022, doi: 10.1109/MCOM.004.2100990.[11] Y. Mao, L. Sun, Y. Liu, and Y. Wang, “Low-latency FoV-adaptive coding and streaming for interactive 360° video streaming,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 3696–3704, doi: 10.1145/3394171.3413751.[12] K. Zhang, J. Cao, and Y. Zhang, “Adaptive digital twin and multiagent deep reinforcement learning for vehicular edge computing and networks,” IEEE Trans. Ind. Informat., vol. 18, no. 2, pp. 1405–1413, Feb. 2022, doi: 10.1109/TII.2021.3088407.[13] T. Taleb et al., “Toward supporting XR services: Architecture and enablers,” IEEE Internet Things J., vol. 10, no. 4, pp. 3567–3586, Feb. 2023, doi: 10.1109/JIOT.2022.3222103.[14] K. Wang, L. Wang, C. Pan, and H. Ren, “Deep reinforcement learning-based resource management for flexible mobile edge computing: Architectures, applications, and research issues,” IEEE Veh. Technol. Mag., vol. 17, no. 2, pp. 85–93, Jun. 2022, doi: 10.1109/MVT.2022.3156745.[15] C. Zhou, S. Wang, M. Xiao, S. Wei, and Y. Liu, “AdaP-360: User-adaptive area-of-focus projections for bandwidth-efficient 360-degree video streaming,” in Proc. 28th ACM Int. Conf. Multimedia, 2020, pp. 3715–3723, doi: 10.1145/3394171.3413521.Digital Object Identifier 10.1109/MVT.2023.3321172CoverVTC2024-SpringMastheadFrom the EditorPresident's MessageVTSMobile RadioTransportation SystemsFrom the Guest EditorsWhen Connected and Automated Vehicles Meet Mobile CrowdsensingSparks of Generative Pretrained Transformers in Edge Intelligence for the MetaverseHow Does a Digital Twin Network Work Well for Connected and Automated VehiclesDigital Twin and Artificial Intelligence-Empowered Panoramic Video StreamingAi-Assisted Edge Caching for Metaverse of Connected and Automated VehiclesFederated Learning-Assisted Vehicular Edge ComputingEnhancement of Satellite-to-Phone Link BudgetAerial Base Stations for Global ConnectivityWireless-Powered Interference NetworksInductive Power Transfer in Electric VehiclesConnected and Automated VehiclesAutomotive ElectronicsSociety NewsAwardsVTC2024-FallTechRxivArchives