We aimed to develop and test a novel metric for the relative performance of blood pressure estimation systems (B-Score). The B-Score sets absolute blood pressure estimation model performance in contrast to the dataset the model is tested upon. We calculate the B-Score based on inter- and intrapersonal variabilities within the dataset. To test the B-Score for reliable results and desired properties, we designed generic datasets with differing inter- and intrapersonal blood pressure variability. We then tested the B-Score's real-world functionality with a small, published dataset and the largest available blood pressure dataset (MIMIC IV). The B-Score demonstrated reliable and desired properties. The real-world test provided allowed the direct comparison of different datasets and revealed insights hidden from absolute performance measures. The B-Score is a functional, novel, and easy to interpret measure of relative blood pressure estimation system performance. It is easily calculated for any dataset and enables the direct comparison of various systems tested on different datasets. We created a metric for direct blood pressure estimation system performance. The B-Score allows researchers to detect promising trends quickly and reliably in the scientific literature. It further allows researchers and engineers to quickly assess and compare performances of various systems and algorithms, even when tested on different datasets.