Abstract
Understanding natural language is an inherently complex task for computer algorithms. Crowdsourcing natural language tasks such as semantic similarity is therefore a promising approach. In this paper, we investigate the performance of crowdworkers and compare them to offline contributors as well as to state of the art algorithms. We will illustrate that algorithms do outperform single human contributors but still cannot compete with results gathered from groups of contributors. Furthermore, we will demonstrate that this effect is persistent across different contributor populations. Finally, we give guidelines for easing the challenge of collecting word based semantic similarity data from human contributors.