The evaluation of newly developed diagnostic tests (tests) commonly involves the comparison of the test outcomes (pos/neg.) of a sample of animals to those of a reference test (gold standard) in order to derive sensitivity and specificity estimates. Often, however, new tests have to be evaluated against an imperfect reference test since a true gold standard test is either too expensive or too costly to apply. This results in bias in the test characteristic estimates. To solve this problem, latent class and Bayesian models can be used to estimate sensitivity and specificity when evaluating a diagnostic test in the absence of a gold standard. They require at least two imperfect reference tests applied to all individuals in the study. In our approach we used a two-test two-population scenario. Both the gold standard and these modelling approaches rely on various assumptions. When violated, biased results will be obtained. The analysis of field data from an Anaplasma marginale outbreak in cattle in Switzerland with four diagnostic procedures (detection of the agent, serology, PCR and hematocrit measurements) was used as a practical example to demonstrate and critically discuss the approaches taken. In this relatively small data set (n = 275) the estimates for the test characteristics obtained by the different methods were quite similar. Overall, the bias in the point estimates depended mainly on the chosen estimation approach. All tests showed a non-negligible correlation mainly in the test sensitivities. This emphasizes the importance of taking into account test dependence even if it seems not biologically plausible at first thought.