dc.contributor.author
Kashongwe, Olivier
dc.contributor.author
Kabelitz, Tina
dc.contributor.author
Ammon, Christian
dc.contributor.author
Minogue, Lukas
dc.contributor.author
Doherr, Markus
dc.contributor.author
Silva Boloña, Pablo
dc.contributor.author
Amon, Thomas
dc.contributor.author
Amon, Barbara
dc.date.accessioned
2024-10-17T14:41:06Z
dc.date.available
2024-10-17T14:41:06Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/45313
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-45025
dc.description.abstract
Missing data and class imbalance hinder the accurate prediction of rare events such as dairy mastitis. Resampling and imputation are employed to handle these problems. These methods are often used arbitrarily, despite their profound impact on prediction due to changes caused to the data structure. We hypothesize that their use affects the performance of ML models fitted to automated milking systems (AMSs) data for mastitis prediction. We compare three imputations—simple imputer (SI), multiple imputer (MICE) and linear interpolation (LI)—and three resampling techniques: Synthetic Minority Oversampling Technique (SMOTE), Support Vector Machine SMOTE (SVMSMOTE) and SMOTE with Edited Nearest Neighbors (SMOTEEN). The classifiers were logistic regression (LR), multilayer perceptron (MLP), decision tree (DT) and random forest (RF). We evaluated them with various metrics and compared models with the kappa score. A complete case analysis fitted the RF (0.78) better than other models, for which SI performed best. The DT, RF, and MLP performed better with SVMSMOTE. The RF, DT and MLP had the overall best performance, contributed by imputation or resampling (SMOTE and SVMSMOTE). We recommend carefully selecting resampling and imputation techniques and comparing them with complete cases before deciding on the preprocessing approach used to test AMS data with ML models.
en
dc.format.extent
16 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
oversampling
en
dc.subject
undersampling
en
dc.subject
missing-value imputation
en
dc.subject
performancemetrics
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::630 Landwirtschaft::637 Milchverarbeitung und verwandte Produkte
dc.title
Influence of Preprocessing Methods of Automated Milking Systems Data on Prediction of Mastitis with Machine Learning Models
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.doi
10.3390/agriengineering6030195
dcterms.bibliographicCitation.journaltitle
AgriEngineering
dcterms.bibliographicCitation.number
3
dcterms.bibliographicCitation.originalpublishername
MDPI
dcterms.bibliographicCitation.pagestart
3427
dcterms.bibliographicCitation.pageend
3442
dcterms.bibliographicCitation.volume
6
dcterms.bibliographicCitation.url
https://doi.org/10.3390/agriengineering6030195
refubium.affiliation
Veterinärmedizin
refubium.affiliation.other
Institut für Veterinär-Epidemiologie und Biometrie
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
2624-7402