Background: The objective of this study was to assess the performance of the first publicly available automated 3D segmentation for spontaneous intracerebral hemorrhage (ICH) based on a 3D neural network before and after retraining. Methods: We performed an independent validation of this model using a multicenter retrospective cohort. Performance metrics were evaluated using the dice score (DSC), sensitivity, and positive predictive values (PPV). We retrained the original model (OM) and assessed the performance via an external validation design. A multivariate linear regression model was used to identify independent variables associated with the model's performance. Agreements in volumetric measurements and segmentation were evaluated using Pearson's correlation coefficients (r) and intraclass correlation coefficients (ICC), respectively. With 1040 patients, the OM had a median DSC, sensitivity, and PPV of 0.84, 0.79, and 0.93, compared to thoseo f 0.83, 0.80, and 0.91 in the retrained model (RM). However, the median DSC for infratentorial ICH was relatively low and improved significantly after retraining, at p < 0.001. ICH volume and location were significantly associated with the DSC, at p < 0.05. The agreement between volumetric measurements (r > 0.90, p > 0.05) and segmentations (ICC & GE; 0.9, p < 0.001) was excellent. Conclusion: The model demonstrated good generalization in an external validation cohort. Location-specific variances improved significantly after retraining. External validation and retraining are important steps to consider before applying deep learning models in new clinical settings.