Model-free optimization of power/efficiency tradeoffs in quantum thermal machines using reinforcement learning