Identifying optimal cycles in quantum thermal machines with reinforcement-learning