More and more, edge devices embark Artificial Neuron Networks. In this context, a trend is to simultaneously decentralize their training as much as possible while shrinking their resource requirements, both for inference and training—tasks that are typically intensive in terms of data, memory, and computation. At the edge’s extremity, a specific challenge arises concerning the inclusion of microcontroller-based devices typically deployed in the IoT. So far, no general framework has been provided for that. Such devices not only have extremely challenging resource constraints (weak CPUs, slow network connections, memory budgets measured in kilobytes) but also exhibit high polymorphism, leading to large variability in computational performance among these devices. In this paper, we design and implement TDMiL, a versatile framework for distributed training, and transfer learning. TDMiL interconnects and combines logical components including CoAPerator (a central aggregator) and various tiny embedded software runtimes that are specifically tailored for networks comprising heterogeneous, resource-constrained devices built on diverse types of microcontrollers. We report on experiments conducted with the TDMiL framework, which we use to comparatively evaluate several schemes devised to address computational variability among distributed learning microcontroller-based devices, i.e., stragglers. Additionally, we release the code of our implementation of TDMiL as an open-source project, which is compatible with common commercial off-the-shelf IoT hardware and a well-known open-access IoT testbed.