Quantitative electroencephalogram analysis (e.g. spectral analysis) has become an important tool in sleep research and sleep medicine. However, reliable results are only obtained if artefacts are removed or excluded. Artefact detection is often performed manually during sleep stage scoring, which is time consuming and prevents application to large datasets. We aimed to test the performance of mostly simple algorithms of artefact detection in polysomnographic recordings, derive optimal parameters and test their generalization capacity. We implemented 14 different artefact detection methods, optimized parameters for derivation C3A2 using receiver operator characteristic curves of 32 recordings, and validated them on 21 recordings of healthy participants and 10 recordings of patients (different laboratory) and considered the methods as generalizable. We also compared average power density spectra with artefacts excluded based on algorithms and expert scoring. Analyses were performed retrospectively. We could reliably identify artefact contaminated epochs in sleep electroencephalogram recordings of two laboratories (healthy participants and patients) reaching good sensitivity (specificity 0.9) with most algorithms. The best performance was obtained using fixed thresholds of the electroencephalogram slope, high-frequency power (25-90 Hz or 45-90 Hz) and residuals of adaptive autoregressive models. Artefacts in electroencephalogram data can be reliably excluded by simple algorithms with good performance, and average electroencephalogram power density spectra with artefact exclusion based on algorithms and manual scoring are very similar in the frequency range relevant for most applications in sleep research and sleep medicine, allowing application to large datasets as needed to address questions related to genetics, epidemiology or precision medicine.