The objective of this study was to conduct a systematic and critical appraisal of the quality of previous publications and describe diagnostic methods, diagnostic criteria and definitions, repeatability, and agreement among methods for diagnosis of vaginitis, cervicitis, endometritis, salpingitis, and oophoritis in dairy cows. Publications (n = 1,600) that included the words "dairy," "cows," and at least one disease of interest were located with online search engines. In total, 51 papers were selected for comprehensive review by pairs of the authors. Only 61% (n = 31) of the 51 reviewed papers provided a definition or citation for the disease or diagnostic methods studied, and only 49% (n = 25) of the papers provided the data or a citation to support the test cut point used for diagnosing disease. Furthermore, a large proportion of the papers did not provide sufficient detail to allow critical assessment of the quality of design or reporting. Of 11 described diagnostic methods, only one complete methodology, i.e., vaginoscopy, was assessed for both within- and between-operator repeatability (κ = 0.55-0.60 and 0.44, respectively). In the absence of a gold standard, comparisons between different tests have been undertaken. Agreement between the various diagnostic methods is at a low level. These discrepancies may indicate that these diagnostic methods assess different aspects of reproductive health and underline the importance of tying diagnostic criteria to objective measures of reproductive performance. Those studies that used a reproductive outcome to select cut points and tests have the greatest clinical utility. This approach has demonstrated, for example, that presence of (muco)purulent discharge in the vagina and an increased proportion of leukocytes in cytological preparations following uterine lavage or cytobrush sampling are associated with poorer reproductive outcomes. The lack of validated, consistent definitions and outcome variables makes comparisons of the different tests difficult. The quality of design and reporting in future publications could be improved by using checklists as a guideline. Further high-quality research based on published standards to improve study design and reporting should improve cow-side diagnostic tests. Specifically, more data on intra- and interobserver agreement are needed to evaluate test variability. Also, more studies are necessary to determine optimal cut points and time postpartum of examination.