Punishment of non-cooperators has been observed to promote cooperation. Such punishment is an evolutionary puzzle because it is costly to the punisher while beneficial to others, for example, through increased social cohesion. Recent studies have concluded that punishing strategies usually pay less than some non-punishing strategies. These findings suggest that punishment could not have directly evolved to promote cooperation. However, while it is well established that reputation plays a key role in human cooperation, the simple threat from a reputation of being a punisher may not have been sufficiently explored yet in order to explain the evolution of costly punishment. Here, we first show analytically that punishment can lead to long-term benefits if it influences one's reputation and thereby makes the punisher more likely to receive help in future interactions. Then, in computer simulations, we incorporate up to 40 more complex strategies that use different kinds of reputations (e.g. from generous actions), or strategies that not only include punitive behaviours directed towards defectors but also towards cooperators for example. Our findings demonstrate that punishment can directly evolve through a simple reputation system. We conclude that reputation is crucial for the evolution of punishment by making a punisher more likely to receive help in future interactions, and that experiments investigating the beneficial effects of punishment in humans should include reputation as an explicit feature.