dc.contributor.author
Hafermann, Lorena
dc.contributor.author
Becher, Heiko
dc.contributor.author
Herrmann, Carolin
dc.contributor.author
Klein, Nadja
dc.contributor.author
Heinze, Georg
dc.contributor.author
Rauch, Geraldine
dc.date.accessioned
2023-03-15T15:08:57Z
dc.date.available
2023-03-15T15:08:57Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/38406
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-38124
dc.description.abstract
Background: Statistical model building requires selection of variables for a model depending on the model's aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed "background knowledge" truly is. In fact, "known" predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.
Methods: We conducted a simulation study assessing the influence of treating variables as "known predictors" in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a "known" predictor if a predefined number of preceding studies identified it as relevant.
Results: Even if several preceding studies identified a variable as a "true" predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.
Conclusions: The source of "background knowledge" should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.
en
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
Background knowledge
en
dc.subject
Univariable selection
en
dc.subject
Backward elimination
en
dc.subject
Variable selection
en
dc.subject
Regression model
en
dc.subject
Simulation study
en
dc.subject
Need for more data sharing
en
dc.subject.ddc
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
dc.title
Statistical model building: Background “knowledge” based on inappropriate preselection causes misspecification
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
196
dcterms.bibliographicCitation.doi
10.1186/s12874-021-01373-z
dcterms.bibliographicCitation.journaltitle
BMC Medical Research Methodology
dcterms.bibliographicCitation.originalpublishername
Springer Nature
dcterms.bibliographicCitation.volume
21
refubium.affiliation
Charité - Universitätsmedizin Berlin
refubium.funding
Springer Nature DEAL
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.bibliographicCitation.pmid
34587892
dcterms.isPartOf.eissn
1471-2288