Data is the fundamental resource required for training deep learning models. The availability and collection of massive datasets, combined with increasing computational power, has enabled the training of large models that learn complex tasks with human-level proficiency. However, the abundance of data is not universal across all domains. This thesis addresses data scarcity issues in deep learning, focusing on two domains: medicine and recommender systems. In medicine, data scarcity stems from the prohibitive costs of annotating medical data for model training and stringent privacy regulations that limit data sharing. Conversely, recommender systems face cold-start problems, which arise when a new user or item enters the system without sufficient historical data for informed recommendations. To reduce the dependency on costly electrocardiogram (ECG) annotations for training heart arrhythmia classifiers, we employed transfer learning methods to teach the models foundational knowledge about ECG signals. We developed novel pre-training tasks to acquire this knowledge using one of the largest public ECG databases. By fine-tuning these pre-trained models on small datasets, we improved performance across specialized ECG classification tasks by up to +6.57% over models trained from scratch. To address privacy concerns regarding data sharing, we employed federated learning, which enables collaborative model training among medical institutions by sharing locally computed updates to the model instead of raw training data. In a large study using a diverse collection of ECG databases, we demonstrated that ECG classifiers trained via federated learning outperform models trained in isolation on local data. Furthermore, on out-of-distribution data from foreign institutions, these classifiers nearly match the performance of models trained with shared data. Addressing item cold-start issues in sequential recommender systems, we developed a novel dynamic storage model that maintains a latent memory of user-item interactions without requiring gradient updates or side information, ensuring efficient representation and timely recommendation of new items during inference. Our proposed model outperforms a similar approach by more than 29% on recommendation tasks and demonstrates the ability to adapt to entirely new datasets without any fine-tuning in zero-shot transfer scenarios. Collectively, our contributions tackle various data scarcity issues in selected domains, advancing contemporary deep learning methodologies in data-limited scenarios.