Autonomous driving has become a priority in the research and development division of the automotive industry. According to the required technical and safety demands of the automobile standardization organizations, localization plays a crucial role in achieving the maximum level of automation in a vehicle. The use of deep learning and neural networks to develop modules of artificial intelligence has become the preferred tool in disciplines such as computer vision. Moreover, the method excels at learning complicated representations by employing supervised learning or self-supervised learning through techniques such as deep reinforcement learning. In particular, the estimation of complex parameters from images such as depth or optical flow out-perform classical method baselines under constrained settings. The models extract rich information, which is used for tasks such as semantic and instance segmentation, as well as to compute temporal associations between video frames or stereo-pair images. In general, applying these end-to-end deep learning models and finding such associations is complex. This thesis explores the applicability of end-to-end deep learning architectures for vehicle localization estimation, using either sensory data from dynamical vehicle parameters or camera images. To achieve this, we observed that the net does not need to learn everything from scratch, and we can use associations that we already know about the physical world. We address these ideas using concepts from physics, geometry, and leveraging transfer learning from large-scale regression data using temporal associations. We also show that autonomous model cars can be used in the process of data collection and that the learned associations can be transferred to other vehicles to improve accuracy. Moreover, we show how the localization estimation generalizes to other scenes, allowing us to regress the displacement of the vehicle given a sequence of temporal data and compose the global estimated position.