Background The mutS-rpoS intergenic region in E. coli displays a mosaic structure which revealed pathotype specific patterns. To assess the importance of this region as a surrogate marker for the identification of highly virulent extraintestinal pathogenic E. coli (ExPEC) strains we aimed to: (i) characterize the genetic diversity of the mutS gene and the o454-nlpD genomic region among 510 E. coli strains from animals and humans; (ii) delineate associations between the polymorphism of this region and features such as phylogenetic background of E. coli, pathotype, host species, clinical condition, serogroup and virulence associated genes (VAG)s; and (iii) identify the most important VAGs for classification of the o454-nlpD region. Methods Size variation in the o454-nlpD region was investigated by PCR amplification and sequencing. Phylogenetic relationships were assessed by Ecor- and Multilocus sequence- typing (MLST), and a comparative analysis between mutS gene phylogenetic tree obtained with RAxML and the MLST grouping method was performed. Correlation between o454-nlpD patterns and the features described above were analysed. In addition, the importance of 47 PCR-amplified ExPEC- related VAGs for classification of o454-nlpD patterns was investigated by means of Random Forest algorithm. Results Four main structures (patterns I-IV) of the o454-nlpD region among ExPEC and commensal E. coli strains were identified. Statistical analysis showed a positive and exclusive association between pattern III and the ExPEC strains. A strong association between pattern III and either the Ecor group B2 or the sequence type complexes known to represent the phylogenetic background of highly virulent ExPEC strains (such as STC95, STC73 and STC131) was found as well. RF analyses determined five genes (csgA, malX, chuA, sit, and vat) to be suitable to predict pattern III strains. Conclusion The significant association between pattern III and group B2 strains suggested the o454-nlpD region to be of great value in identifying highly virulent strains among the mixed population of E. coli promising to be the basis of a future typing tool for ExPEC and their gut reservoir. Furthermore, top-ranked VAGs for classification and prediction of pattern III were identified. These data are most valuable for defining ExPEC pathotype in future in vivo assays.