Paolo Cudrano, Barbara Gallazzi, Matteo Frosi, Simone Mentasti, Matteo Matteucci
©SHUTTERSTOCK.COM/MIND PIXELL
Autonomous vehicles rely on lane-level high-definition (HD) maps for self-localization and trajectory planning. Current mapping, however, relies on simple line models, while clothoid curves have unexplored potential. Clothoids, well known in road design, are often chosen to model the vehicle trajectory in planning and control systems as they describe the road with higher fidelity. For this reason, we propose two vision-based pipelines for generating lane-level HD maps using clothoid models. The first pipeline performs mapping with known poses, requiring precise real-time kinematics GPS (RTK GPS) measurements; the second copes with noisy localizations, solving the simultaneous localization and mapping (SLAM) problem. Both pipelines rely on a line detection algorithm to identify each line marking and perform a graph-based optimization to estimate the map.
The resulting lane-level HD map represents each line marking with a spline of clothoids, exhibiting G1 continuity. Subsequent greedy techniques are presented to prune unnecessary clothoids, compressing the resulting map. The experimental validation performed on racetracks shows the advantages of our pipeline compared to other line representations and fitting strategies, while tests conducted on extra-urban roads demonstrate that our work can generalize to challenging scenarios and can be effectively used to generate lane-level HD maps in extra-urban scenarios.
Autonomous vehicles must always maintain for their safe operation an accurate representation of their surroundings through sensing systems and precise self-localization. Nevertheless, the presence of other vehicles and pedestrians makes the road a highly dynamic environment, and substantial effort needs to be devoted to obstacle detection, tracking, and motion forecasting. At the same time, the vehicle must also be aware of static road features such as traffic signs, pavement markings, and the presence of sidewalks and buildings. These road elements need to be detected, and road sign semantics need to be interpreted to comply with the traffic code and coexist with other agents and nonautonomous road users.
An autonomous vehicle could sense and interpret the static aspects of the scene repeatedly at every instant, jointly with the dynamical ones. However, this approach would be highly inefficient as it would absorb a large portion of the limited computation at disposal. With accurate maps, the resources required by scene understanding could instead be invested only on dynamical objects, which require higher alertness and faster reaction times. Meanwhile, parsing the entire scene in real time enforces substantial constraints on the precision that can be achieved, thus hindering the overall performance of the vehicle. Finally, every vehicle driving through the same road would be required to perform the same static computation over and over again, globally wasting precious resources.
For this reason, academia and industry are significantly focusing on the development and maintenance of HD maps, centimeter-precise road maps providing detailed geometric and semantic information on every static aspect of the road necessary for the safe operation of autonomous vehicles. HD maps relieve the vehicles from most of the scene parsing, thus allowing them to focus on dynamical objects, trajectory planning, and control.
Given the enormous push in this direction registered in the last few years, HD maps have become a de facto standard in the development of autonomous vehicles [1], with major automakers and mapping companies strictly collaborating in the production of commercial HD maps [2]. Generally, these maps are composed of several layers [1]. Although each commercial implementation differs, lower layers typically resemble traditional maps and are used for global planning, while higher layers provide semantic details and high-precision geometric information on road markings, traffic signs, and any roadside furniture. Typically, we speak of lane-level HD maps when considering only layers describing road lanes and related line markings.
Lane-level HD maps play a crucial role in local self-localization, trajectory planning, and motion forecasting and require centimeter-precise geometric details of the lane boundaries. For estimating road boundaries, state-of-the-art approaches use cameras [3] and lidars [4]. At first, road lines are identified with a line detection pipeline, as in [5]. Then, road lines are represented with a mathematical model to obtain a compact representation to be stored in the map and retrieved when needed. To this end, most works rely on polylines [6], cubic splines [7], or combinations of waypoints and Bezier curves [15] for their simplicity. However, these models are approximations that are often not faithful to the true geometry of the road, thus becoming potentially inaccurate.
Clothoids (or Euler spirals) are parametric plane curves whose curvature changes linearly with their curvilinear abscissa (Figure 1), i.e., \begin{align*}{x}{\left({s}\right)} & = {x}_{0} + \mathop{\int}\nolimits_{0}\nolimits^{s}{\cos\left({\frac{1}{2}{{\kappa}{\tau'}}^{2} + {\kappa}{\tau} + {\theta}_{0}}\right){d}{\tau}} \\ {y}{\left({s}\right)} & = {y}_{0} + \mathop{\int}\nolimits_{0}\nolimits^{s}{\sin\left({\frac{1}{2}{{{\kappa}{\tau'}}}^{2} + {\kappa}{\tau} + {\theta}_{0}}\right){d}{\tau}} \\ {\theta}{\left({s}\right)} & = \frac{1}{2}{\kappa'}{s}^{2} + {\kappa}{s} + {\theta}_{0} \tag{1} \end{align*}
Figure 1 An example of a clothoid segment, highlighting its origin ${x}_{0}, {y}_{0}$, initial orientation ${\theta}_{0}$, and length ${L}$.
Clothoid curves find frequent application in road design as their smooth curvature changes guarantee that no sudden jerk is perceived by the passengers on road bends, thus improving their comfort [8]. As a consequence, clothoids are also widely adopted in control systems to model the road centerline [9] and, ultimately, the planned trajectory of the vehicle [10]. Furthermore, clothoids are currently supported by the most known standards for road line specifications, i.e., ASAM OpenDRIVE and the NDS Open Lane Model [1], although polylines are still the most used model in the literature.
For their properties, clothoids represent the ideal model for the representation of road lines on HD maps. They provide a line representation more faithful to the geometry of the road, which ultimately benefits control and planning systems. Moreover, the infrastructure for their adoption is already fully available. Yet these models are still unused.
In this work, we propose a clothoid-based pipeline generating line marking maps of the road from monocular vision, proving that it is possible to use clothoid models to obtain lane-level HD maps, not only when precise GPS measurements of the trajectory are available (known poses) but also when it is impossible to obtain them (under uncertainty). The core contributions of our work are
Let us first consider the scenario in which the vehicle position is acquired through a high-frequency and accurate RTK GPS sensor while a forward-facing camera captures the scene. As depicted in Figure 2, our pipeline is composed of three main stages. At first, an initial clothoid spline is defined using the output of a line detection system; then, this spline is optimized through a graph-based approach; finally, a pruning algorithm compacts this representation, removing unnecessary clothoids.
Figure 2 A pipeline for HD mapping with known poses. Our system acquires camera images with our experimental vehicle and georeferences each image with precise RTK GPS measurements. For each frame, we extract a set of raw data points from each road line with a line detection pipeline [11] and cumulate them throughout the entire acquisition stage. The cumulated points are then divided in blocks with the help of DBSCAN. For each block, a cubic smoothing spline removes noise and enables the clothoid fitting through G1 interpolation. The obtained initial clothoids are optimized through a graph-based formulation. Finally, a greedy pruning iteratively removes unnecessary clothoids until no more can be removed. The result is a precise clothoid-based lane-level HD map. CNN: convolutional neural network; BEV: bird’s eye view; WLF: window-based line following; DBSCAN: density-based spatial clustering of applications with noise.
For each camera image, we extract the set of points representing each line marking (raw data points), exploiting the pipeline described in [11]. Briefly, line markings on the image plane are retrieved using a U-net-based convolutional neural network (CNN), trained for image segmentation. This model retrieves a binary mask indicating the presence or absence of road markings for each image’s pixel. Through inverse perspective mapping and with the knowledge of the camera’s intrinsic and extrinsic parameters, the mask is then projected to the bird’s eye view (BEV). Next, a window-based line following (WLF) algorithm is applied to obtain a concise set of points for each line marking. Finally, we leverage the camera calibration to calculate the position of each sampled point in the vehicle reference frame. For further details on this pipeline, we refer the reader to the original work [11].
We can now generate an initial clothoid spline, which will be further optimized in the following steps. To do so, we consider only raw data points within 8 m from the vehicle and cumulate them in a global east-north-up (ENU) coordinate system, exploiting the vehicle’s GPS location. In the following, we process each line marking separately and start by splitting the raw data points into blocks of 10 m each. To cope with possible line gaps and missing detections, we adjust each block by exploiting the clustering algorithm DBSCAN [12], merging small clusters with neighboring blocks.
Each raw data block contains a cluster of noisy points, which must be smoothed before they can be used to define a clothoid. To do so, we first zero-mean and rotate each point in the block through singular value decomposition (SVD) and then fit its points with a cubic smoothing spline. Cubic smoothing splines allow us to obtain a noise-free representation of the line in the block as they find a tradeoff between fitting error and curve smoothness. While the first singular component is used as an abscissa for the model, the second singular value provides information to adjust the smoothing parameter of the spline. A small part of points from the adjacent blocks is included during fitting to enforce more smoothness between each block, although at this stage, this does not result in perfect continuity.
Between each pair of consecutive cubic smoothing splines, we define a knot ${\overrightarrow{{k}_{I}}} = {\left[{{x}_{i}{y}_{i}{\theta}_{i}}\right]'}$, with ${x}_{i},{y}_{i}$ midpoint between the spline extremes and ${\theta}_{j}$ mean of their tangent angles at the extremes. Bertolazzi and Frego [13] proved that, given two points and their orientations, there exists one and only one clothoid segment passing through those points with those tangential angles, providing a formula to compute it (G1 clothoid interpolation). We apply this result to define a set of clothoids connecting every pair of consecutive knots ${\overrightarrow{{k}_{I}}}, {\overrightarrow{{k}_{{I} + {1}}}}$. By construction, this clothoid spline is continuous and derivable at each knot ${\overrightarrow{{k}_{I}}}$, i.e., it exhibits G1 continuity properties. Given these essential properties, this clothoid spline represents our initial lane-level HD map.
Before proceeding with its optimization, however, we can further simplify its representation. Indeed, at the moment, each clothoid describes approximately 10 m of road, while it is often the case that line markings can be represented with longer clothoids (e.g., in straight roads). Therefore, we apply a simple pruning algorithm, iteratively attempting to merge adjacent clothoids when they present similar curvatures at a joint knot ${\overrightarrow{{k}_{I}}}$, i.e., ${\kappa}_{{i}{-}{1}}\left({{L}_{i}}\right)\approx{\kappa}_{i}\left({0}\right)$. The new clothoid is obtained through G1 clothoid interpolation for knots $\overrightarrow{{k}_{{I}{-}{1}}}, {\overrightarrow{{k}_{{I} + {1}}}}$, and the merge takes place only when the position and tangent of the new clothoid at the pruned knot are not significantly affected, i.e., ${[}{x}_{\text{new}}{y}_{\text{new}}{\theta}_{\text{new}}{]'}\,{\approx}\,{\overrightarrow{{k}_{I}}}$.
The initial clothoid spline presented previously is constructed with little consideration for its fitting error with respect to the original raw data points. At this stage, we optimize its parameters to improve its fitting while preserving its continuity and derivability. To this end, we rely on graph-based least-squares optimization, using the g2o framework [14]. We refer the reader to the original work [14] for further details.
In an optimization graph, each node ${i}\,{\in}\,{N}$ is associated with a parameter vector ${p}_{i}$ to be optimized, while unary and binary edges ${\epsilon}\,{\in}\,{E}$ are used to enforce constraints on these vectors in the form of a vector error function ${e}_{\epsilon}$. In our problem, each clothoid ${C}_{i}$ in the spline is represented as a node, with ${p}_{i} = {\left[{{\kappa'}_{i}{\kappa}_{i}{\theta}_{i}{\left({0}\right)}{x}_{i}{\left({0}\right)}{y}_{i}{\left({0}\right)}}\right]'}$. As depicted in Figure 2 (bottom right), we define two types of edges:
Therefore, the overall optimization has the following objective: \begin{align*}{\min}\,{F} & = {\min} \mathop{\sum}\limits_{{\epsilon}\,{\in}\,{E}}{e}_{\epsilon'}{\Omega}_{\epsilon}{e}_{\epsilon}, \\ {e}_{{\epsilon}^{f}}{\left({p}_{i}\right)} & = \sqrt{\frac{1}{\left\vert{R}_{i}\right\vert} \mathop{\sum}\limits_{{r}\,{\in}\,{R}_{i}}{\text{dist}}{\left({r}, {p}_{i}\right)}^{2}} \\ {e}_{{\epsilon}^{s}}{\left({p}_{i}, {p}_{{i} + {1}}\right)} & = {\left[\begin{array}{l}{\left\Vert{\left[\begin{array}{c}{x}_{i}{\left({L}_{i}\right)} \\ {y}_{i}{\left({L}_{i}\right)}\end{array}\right]}{-}{\left[\begin{array}{c}{x}_{{i} + {1}}{\left({0}\right)} \\ {y}_{{i} + {1}}{\left({0}\right)}\end{array}\right]}\right\Vert}_{2} \\ \quad {\angle}{\left({\theta}_{i}{\left({L}_{i}\right)}{-}{\theta}_{{i} + {1}}{\left({0}\right)}\right)}\end{array}\right]} \tag{2} \end{align*} where the information matrices ${\Omega}_{\epsilon}$ are tuned to enforce a balance between fitting and smoothness. The result of this stage is an optimized G1 clothoid spline with a low fitting error.
At this stage, we can attempt to further prune the number of clothoids, aiming at a minimal representation. To this end, we adopt a greedy algorithm based on the graph-based optimization seen previously. In particular, we iteratively select a clothoid ${C}_{i}$ with probability inversely proportional to its length. ${C}_{{i}{-}{1}}, {C}_{{i} + {1}}$ are redefined to join at the midpoint of ${C}_{i}$, ${\overrightarrow{{k}_{I}^{\ast}}}$. We then construct a local optimization graph analogous to the one described previously but containing only clothoids ${C}_{{i}{-}{2}}, {C}_{{i}{-}{1}}, {C}_{{i} + {1}}$, and ${C}_{{i} + {2}}$, and we perform the optimization. We accept the pruning if the result does not worsen the original fitting while maintaining the G1 properties. The final output of this pipeline is a storage-efficient lane-level HD map based on clothoid splines, obtained using only monocular vision and the precise GPS locations of the vehicle.
It is often the case that the estimated position of the vehicle is affected by some level of uncertainty. The most common scenario sees the usage of a standard GPS, which is affected by higher noise levels (in the order of 1 m) and lower frequencies. Additionally, in urban scenarios, tunnels and urban canyons can often degrade or obscure even RTK GPS signals, affecting the measurements of the vehicle location.
In this section, we show that, even with noisy and low-frequency GPS measurements, it is possible to obtain a lane-level HD map based on clothoid splines. To this end, we adopt a simultaneous localization and mapping (SLAM) approach, aiming to retrieve an analytical description of each line marking for our HD map and simultaneously localize the vehicle, enhancing the available GPS measurements. This method is designed to process the acquired data online and in real time during the vehicle motion. As depicted in Figure 3, our pipeline consists of mainly two stages: data preprocessing to obtain an initial clothoid spline and graph-based optimization to obtain the complete lane-level HD map. A third optional pruning stage is also presented.
Figure 3 The Clothoid-SLAM pipeline. The system acquires camera images, odometry, and GPS measurements from our experimental vehicle at different frequencies. A front-end continuously detects the road lines in each camera frame and accumulates the raw points using the vehicle odometry. Every time a GPS measurement is acquired, the cumulated points are transferred to a back-end, which generates an initial clothoid for each line marking and performs the graph optimization, adding the necessary nodes and edges to the graph. The back-end continuously produces an estimate of the lane-level HD map and vehicle trajectory.
For each acquired image, we detect the line markings and extract the raw data points in the vehicle reference frame using the same pipeline presented in the previous section. However, working with a low-frequency GPS (e.g., 1 Hz), we can directly represent those points in a global ENU reference frame only when a GPS measurement becomes available. For the remaining time, we first estimate the vehicle position from its odometry, computed using wheels and steering encoders. Then, we use this estimate to project the points in the ENU global reference frame, associating them with the last available GPS measurement. Finally, each cumulated block is fit using a cubic smoothing spline, whose extremes are used to generate an initial clothoid through G1 clothoid interpolation.
The initial clothoids obtained previously are significantly affected by the level of uncertainty associated with the vehicle pose. In fact, our estimate suffers from the noise in the GPS measurements and from the errors introduced by the odometry. To obtain an optimal clothoid spline for each line marking, together with a better estimate of the vehicle trajectory, we perform a graph-based optimization, expanding on the optimization graph presented in the previous section.
As before, we represent each clothoid with a clothoid node $i$, with a parameter vector ${p}_{i} = {\left[{{\kappa'}_{i}{\kappa}_{i}{\theta}_{i}\left({0}\right){x}_{i}\left({0}\right){y}_{i}\left({0}\right)}\right]'}$. In addition, we represent the pose of the vehicle at each GPS instant with a vehicle node ${j}$ with a parameter vector ${q}_{j} = {\left[{{x}_{j}{y}_{j}{\theta}_{j}}\right]'}$. We then enforce a small fitting error and smoothness on each clothoid with the same edges presented previously, namely fitting edges ${\epsilon}^{f}$ and smoothness edge ${\epsilon}^{s}$. Moreover, we design three additional edges to optimize the vehicle trajectory and one further edge to impose constraints between poses and line markings. In particular, we add
The optimization is performed online, with a front-end process parsing the sensory information from the cameras and GPS, while a back-end process performs the SLAM optimization. This is achieved through a window-based dynamical node selection, exploiting a multilevel graph configuration. In detail, when at time ${t}_{k}$ a GPS measurement is acquired, the corresponding vehicle node is added to the graph, together with a clothoid node for each line detected since ${t}_{{k}{-}{1}}$. The initial parameter vectors for these new nodes are respectively the GPS measurement and the initial clothoid described previously.
The optimization is organized in three phases. First, we optimize the new clothoid nodes, activating only their respective fitting edges ${\epsilon}^{f}$, to refine their initial parameters. Second, we perform the actual SLAM on the new clothoid and vehicle nodes, together with the last ${n}$ vehicle nodes, to allow for adjustments in the last part of the trajectory. In this phase, we activate all vehicle-related edges, i.e., ${\epsilon}^{g}{, }{\epsilon}^{op},{\epsilon}^{od}$, and ${\epsilon}^{v}$. In particular, the virtual measurement edges ${\epsilon}^{v}$ enforce continuity and derivability at the clothoid knots, realigning the new vehicle poses accordingly. Finally, we optimize the last $m$ clothoid nodes, activating all the related fitting and smoothness edges. In this manner, we allow the optimized tract of the generated map to adjust smoothly to the newly measured section. As expected, the final output of this optimization is an enhanced vehicle localization together with an analytical description of the line markings over the trajectory covered by the vehicle.
As the lines are processed online, obtaining data at a fixed frequency, a road tract may be represented using more clothoid segments than needed. As in the case of mapping with known poses, we can postprocess the current lane-level HD map to obtain a more compact representation. To avoid incorrect merges, however, we perform this operation offline, after the entire map has been generated. For this reason, we can use the same pruning procedure previously described for mapping with known poses.
We validated our two pipelines on real-world data collected on two racetracks in Italy, the Monza Eni Circuit and the Aci-Sara Lainate circuit. As seen in Figure 4, these circuits present a desirable structure for validation as the diversity of their tracts simulates several urban scenarios, such as straight roads and various types of bends and roundabouts.
Figure 4 The two race circuits where our dataset has been collected. The variety of turn shapes serves as a benchmark for several urban scenarios. (a) The Monza Eni Circuit. (b) The Aci-Sara Lainate circuit.
Our experimental vehicle is equipped with a forward-facing camera acquiring monocular images at 33 frames per second. A Swiftnav Piksi RTK GPS retrieves the vehicle position at 10 Hz. The vehicle is also equipped with wheels and steering encoders, allowing us to compute odometry. To perform mapping under uncertainty, we simulated a standard GPS by downsampling the RTK GPS at 1 Hz and introducing Gaussian noise with a circular error probable of 2 m.
To verify the accuracy of the produced lane-level HD maps, our dataset includes the precise ground truth position of each line marking, manually acquired using an RTK GPS. We make the described dataset publicly available at https://airlab.deib.polimi.it/datasets-and-tools/ for further research.
To also validate our pipeline outside of racetrack scenarios, we acquired an additional dataset in extra-urban settings around Milan, Italy. For these data, however, we could not acquire the ground truth of the line positions; thus, we can adopt it only for qualitative analysis.
After applying our pipelines to the datasets described previously, we assess their performances through different metrics. We measure the accuracy of the road lines by means of the mean absolute error (MAE) with respect to the ground truth lines. However, our algorithms could, in principle, produce discontinuous clothoid splines, which would be worthless as maps. Therefore, it is also crucial to assess whether the G1 constraints are satisfied. For this reason, at each spline knot ${\overrightarrow{{k}_{I}}}$, we compute the distance between the two joining clothoids (continuity metric) and the angle difference between their tangent angles (derivability metric), analyzing their mean and maximum values. We deem them acceptable for continuities of 1 cm and derivabilities below 1°. Notice, finally, that, given a track, each metric can be computed on the overall track or on specific sections (e.g., straight tracts and bends).
To assess the improvements obtained through our mapping method, we compare our results against two baselines based on line models commonly found in the literature and obtained as follows. We cumulate the raw data points as in our mapping pipeline, splitting points ${P}_{i}$ in blocks using DBSCAN and ordering them through SVD. At this stage, conversely, we fit all points ${P}_{i}$ together, with a parametric 2D smoothing polyline. This model represents our first baseline and is intended to compare our clothoid line model to a model well known in the literature. This polyline model, however, is not an intrinsic curve representation, nor it is G1.
Thus, we define an additional baseline, returning a parametric and intrinsic G1 curve, to compare the performance of our fitting procedure against a model with compatible properties. To this end, we fit the points ${P}_{i}$ with a 2D cubic smoothing spline. We then sample this model with a fine-grained step and, exploiting its derivatives, we compute the tangential angle ${\theta}$ at each sampled point. At the same time, we compute the cumulative arc-length ${s}$ of the curve, summing the distance between successive samples. Finally, we use these values to fit a 1D quadratic smoothing spline ${\theta} = {\theta}{\left({s}\right)}$, which is equivalent to a clothoid spline and represents our second baseline. Of course, this technique is prone to outliers in the raw points, and this is indeed one of the motivations for adopting our approach instead. Notice that these baselines can be computed only in the case of perfect localization, while this approach cannot be performed with uncertain vehicle poses.
In Table 1, we report the results of the lane-level HD map generation for the Monza Eni Circuit and Aci-Sara Lainate circuit, respectively. Figure 5(a) provides a qualitative demonstration of the mapping results with known poses on the Monza Eni Circuit, while Figure 5(b) displays the lane-level HD map and trajectory obtained through Clothoid-SLAM on the Aci-Sara Lainate circuit.
Figure 5 An example of the results obtained overlayed to aerial images. (a) Lane-level HD maps obtained with known poses on the Monza Eni Circuit, including a zoom-in over the characteristic bends of the Variante Ascari (lane-level HD map for complete track and detail). (b) Lane-level HD map (lines) and estimated vehicle trajectory (dots) obtained through Clothoid-SLAM on the Aci-Sara Lainate circuit (lane-level HD map and estimated vehicle poses).
Table 1 The performance of the two pipelines on the generation of a clothoid-based lane-level HD map.
Table 1 shows that our pipeline for mapping with known poses achieves competitive results. On the Monza Eni Circuit, we obtain a full MAE of ${0.33}{\text{ m}}$, with peak values of $0.46{\text{ m}}$ on the sharpest chicane. The obtained map contains a total of 283 clothoids (for both road lines), reduced from 917 through pruning. At the same time, the G1 requirements are met, with maximum continuity and derivability metrics in the order, respectively, of ${10}^{{-}{2}}{\text{ m}}$ and ${1}^{\circ}$. Analogously, the Aci-Sara Lainate circuit presents a lower MAE of approximately $0.36{\text{ m}}$, with the resulting map captured using a total of 145 clothoids, reduced from 283. The continuity and derivability metrics are, respectively, in the order of ${10}^{{-}{2}}{\text{ m}}$ and ${10}^{{{-}{1}}^{\circ}}$.
Notice that these results are also comparable with state-of-the-art approaches [15], which, however, use different line representations and do not guarantee G1 smoothness. Moreover, our pipeline outperforms both baselines on all tested road sections except for the Variante Ascari, where the recorded MAE is still acceptable. This suboptimal behavior is due to misdetections in the preexisting line detection pipeline we adopt. Indeed, although our method is robust to noise thanks to our optimization stages, significant artifacts in the line markings can slightly impact on the accuracy of our final map.
From Table 1, we also notice that, even when coping with uncertain vehicle poses, we can still obtain results acceptable for autonomous driving using our SLAM pipeline. Moreover, the G1 requirements are always well met, with continuity and derivability metrics in the order of ${10}^{{-}{5}}{\text{ m}}$ and ${10}^{{{-}{3}}^{\circ}}$ for both circuits. Also in this case, the pruning algorithm largely compresses the representation, from 660 total clothoids to 274 for the Monza Eni Circuit and from 310 total to 116 for the Aci-Sara Lainate circuit. Notice that, in this scenario, it would not have been possible to use our pipeline for mapping with known poses as the average localization error with our standard GPS amounts to 1.7714 m for the Monza Eni Circuit and 1.5281 m for the Aci-Sara Lainate circuit. Our SLAM pipeline, instead, can correct the localization of the vehicle on the two tracks to average errors of 0.7471 and 0.7131 m, respectively, while constructing an accurate map.
We performed the evaluation exposed so far entirely on racetracks as, in such a scenario, we could manually obtain an accurate RTK GPS ground truth of the road lines. To complement our evaluation, we run our mapping pipeline on the additional extra-urban dataset we acquired. Figure 6 shows the qualitative results obtained in diverse road situations.
Figure 6 An example of the qualitative results obtained with our mapping pipeline on an extra-urban scenario. Our map (light blue) is overlayed on the corresponding satellite images to allow for a visual comparison against the actual road markings.
In this work, we discussed the generation of lane-level HD maps for autonomous driving and highlighted the importance of clothoid line models. These models are known to guarantee passenger comfort in road design and are adopted for trajectory planning and vehicle control.
While state-of-the-art methods prefer simpler line models, we have shown that it is possible to construct lane-level HD maps using clothoids, achieving satisfactory performances. Moreover, we have shown how it is possible to generate lane-level HD maps even when no precise localization information is available, opening the possibility of generating them with any vehicle equipped with a monocular camera and odometric sensors.
Given the lack of benchmarks on lane-level HD maps, we successfully validated our pipelines on a real-world dataset, purposefully acquired with our instrumented vehicle on two known circuits, whose shapes are optimal to simulate urban scenarios, and an extra-urban scenario. The experimental validation has shown that clothoid-based line representations obtained with our pipeline achieve state-of-the-art mapping results while allowing for easy integration with standardized HD map frameworks.
Paolo Cudrano and Barbara Gallazzi contributed equally to this article.
Paolo Cudrano (paolo.cudrano@polimi.it) is a Ph.D. student in computer science and engineering at the Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy. He received his M.Sc. degree in computer science and engineering from the Politecnico di Milano and his M.Sc. degree in computer science from McMaster University in 2019. His research focuses on perception systems for robotics and autonomous vehicles.
Barbara Gallazzi (barbara.gallazzi@mail.polimi.it) is part of the ADAS/AD Department, Cariad SE, Ingolstadt, Germany. She received her M.Sc. degree in automation and control engineering from the Politecnico di Milano in 2021 and her B.Sc. degree in automation engineering from the Politecnico di Milano in 2019.
Matteo Frosi (matteo.frosi@polimi.it) is a Ph.D. student at the Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy. He received his M.S. degree in computer science and engineering from the Politecnico di Milano in 2019. His main research focuses on simultaneous localization and mapping, while his interests also include 3D reconstruction and robotics in general.
Simone Mentasti (simone.mentasti@polimi.it) is a researcher at the Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy. He received his M.S. degree in computer science from the Università Statale di Milano, Italy, in 2017 and his Ph.D. degree in computer engineering from the Politecnico di Milano in 2022. His research interests focus on robotics, perception, sensor fusion, sensor calibration, and deep learning for autonomous driving cars.
Matteo Matteucci (matteo.matteucci@polimi.it) is a full professor at the Dipartimento di Elettronica Informazione e Bioingegneria, Politecnico di Milano, Milano, Italy. He received his Ph.D. degree in computer engineering and automation from the Politecnico di Milano. His main research topics are pattern recognition, machine learning, machine perception, robotics, computer vision and signal processing.
[1] R. Liu, J. Wang, and B. Zhang, “High definition map for automated driving: Overview and analysis,” J. Navig., vol. 73, no. 2, pp. 324–341, Mar. 2020, doi: 10.1017/S0373463319000638.
[2] “At CES 2022 HERE demonstrates how location intelligence powers the future of driving.” HERE Technologies. Accessed: Feb. 11, 2022. [Online] . Available: https://www.here.com/company/press-releases/en/ces-2022-here-demonstrates-how-location-intelligence-powers-future
[3] Y. Zhou, Y. Takeda, M. Tomizuka, and W. Zhan, “Automatic construction of lane-level HD maps for urban scenes,” 2021. [Online] . Available: http://arXiv.org/abs/2107.10972
[4] I. Hamieh, R. Myers, and T. Rahman, “Construction of autonomous driving maps employing LiDAR odometry,” in Proc. IEEE Can. Conf. Electr. Comput. Eng. (CCECE), May 2019, pp. 1–4, doi: 10.1109/CCECE.2019.8861901.
[5] W. Jang, J. An, S. Lee, M. Cho, M. Sun, and E. Kim, “Road lane semantic segmentation for high definition map,” in Proc. IEEE Intell. Veh. Symp. (IV), Oct. 2018, pp. 1001–1006, doi: 10.1109/IVS.2018.8500661.
[6] N. Homayounfar, W.-C. Ma, J. Liang, X. Wu, J. Fan, and R. Urtasun, “DAGMapper: Learning to map by discovering lane topology,” in Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), 2019, pp. 2911–2920, doi: 10.1109/ICCV.2019.00300.
[7] L. Ma, Y. Li, J. Li, Z. Zhong, and M. A. Chapman, “Generation of horizontally curved driving lines in HD maps using mobile laser scanning point clouds,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 12, no. 5, pp. 1572–1586, May 2019, doi: 10.1109/JSTARS.2019.2904514.
[8] E. D. Lambert, R. Romano, and D. Watling, “Optimal path planning with clothoid curves for passenger comfort,” in Proc. 2019 5th Int. Conf. Veh. Technol. Intell. Transp. Syst. (VEHITS), pp. 609–615, doi: 10.5220/0007801806090615.
[9] M. Bersani et al., “An integrated algorithm for ego-vehicle and obstacles state estimation for autonomous driving,” Robot. Auton. Syst., vol. 139, p. 103,662, May 2021, doi: 10.1016/J.ROBOT.2020.103662.
[10] J. A. R. Silva and V. Grassi, “Clothoid-based global path planning for autonomous vehicles in urban scenarios,” in Proc. IEEE Int. Conf. Robot. Automat., Sep. 2018, pp. 4312–4318, doi: 10.1109/ICRA.2018.8461201.
[11] P. Cudrano, S. Mentasti, M. Matteucci, M. Bersani, S. Arrigoni, and F. Cheli, “Advances in centerline estimation for autonomous lateral control,” in Proc. IEEE Intell. Veh. Symp. (IV), Oct. 2020, pp. 1415–1422, doi: 10.1109/IV47402.2020.9304729.
[12] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowl. Discovery Data Mining (KDD-96), Aug. 1996, pp. 226–231. [Online] . Available: www.aaai.org
[13] E. Bertolazzi and M. Frego, “G1 fitting with clothoids,” Math. Methods Appl. Sci., vol. 38, no. 5, pp. 881–897, Mar. 2015, doi: 10.1002/MMA.3114.
[14] R. Kümmerle, G. Grisetti, H. Strasdat, K. Konolige, and W. Burgard, “g2o: A general framework for graph optimization,” in Proc. IEEE Int. Conf. Robot. Automat., 2011, pp. 3607–3613, doi: 10.1109/ICRA.2011.5979949.
[15] Y. Zhou et al., “Automatic construction of lane-level HD maps for urban scenes,” in Proc. 2021 IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), pp. 6649–6656, doi: 10.1109/IROS51168.2021.9636205.
Digital Object Identifier 10.1109/MVT.2022.3209503