BiomedBERT Reranker

This is a Cross Encoder model finetuned from microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

The training dataset was generated using a random sample of PubMed title-abstract pairs along with similar title pairs.

Usage (txtai)

This model can be used to score a list of text pairs. This is useful as a reranking pipeline after an initial semantic search operation.

from txtai.pipeline import Similarity

ranker = Similarity(path="neuml/biomedbert-base-reranker", crossencode=True)
ranker("query", ["document1", "document2"])

Usage (Sentence-Transformers)

Alternatively, the model can be loaded with sentence-transformers.

from sentence_transformers import CrossEncoder

model = SentenceTransformer("neuml/biomedbert-base-reranker")
model.predict([["query", "document1"], ["query", "document2"]])

Evaluation Results

Performance of this model is compared to previously released models trained on medical literature.

The following datasets were used to evaluate model performance.

PubMed QA
- Subset: pqa_labeled, Split: train, Pair: (question, long_answer)
PubMed Subset
- Split: test, Pair: (title, text)
PubMed Summary
- Subset: pubmed, Split: validation, Pair: (article, abstract)

Evaluation results are shown below. The Pearson correlation coefficient is used as the evaluation metric.

Model	PubMed QA	PubMed Subset	PubMed Summary	Average
all-MiniLM-L6-v2	90.40	95.92	94.07	93.46
bioclinical-modernbert-base-embeddings	92.49	97.10	97.04	95.54
biomedbert-base-colbert	94.59	97.18	96.21	95.99
biomedbert-base-reranker	97.66	99.76	98.81	98.74
pubmedbert-base-embeddings	93.27	97.00	96.58	95.62
pubmedbert-base-embeddings-8M	90.05	94.29	94.15	92.83

As expected, this cross-encoder model scores much higher than bi-encoder models and late interaction models. The tradeoff is that this is expensive to run and there is no way to scale it past small batches of data. But it's a great model for re-ranking medical literature.