Flexible characterization techniques that provide a detailed picture of the experimental imperfections under realistic assumptions are crucial to gain actionable advice in the development of quantum computers. Gate set tomography self-consistently extracts a complete tomographic description of the implementation of an entire set of quantum gates, as well as the initial state and measurement, from experimental data. It has become a standard tool for this task but comes with high requirements on the number of sequences and their design, making it already experimentally challenging for only two qubits. In this work, we show that low-rank approximations of gate sets can be obtained from significantly fewer gate sequences and that it is sufficient to draw them at random. This coherent noise characterization however still contains the crucial information for improving the implementation. To this end, we formulate the data processing problem of gate set tomography as a rank-constrained tensor completion problem. We provide an algorithm to solve this problem while respecting the usual positivity and normalization constraints of quantum mechanics. For this purpose, we combine methods from Riemannian optimization and machine learning and develop a saddle-free second-order geometrical optimization method on the complex Stiefel manifold. Besides the reduction in sequences, we numerically demonstrate that the algorithm does not rely on structured gate sets or an elaborate circuit design to robustly perform gate set tomography. Therefore, it is more flexible than traditional approaches. We also demonstrate how coherent errors in shadow estimation protocols can be mitigated using estimates from gate set tomography.