Rerank Sentences
Langformers also supports reranking. Reranking reorders a list of documents (or sentences/texts) based on their relevance to a given query.
Vector search may not always yield the most relevant results. Reranking can help improve the quality of the retrieved documents by reordering them based on their relevance to the query.
Vector search and Reranking
For instance, suppose a user searches for “Where is Mount Everest?”, and our database contains the following documents:
“Mount Everest is the highest mountain in the world.”
“Mount Everest is in Nepal.”
“Where is Mount Everest?”
A basic vector search might rank the third document highest because it closely matches the wording of the query — that’s what semantic similarity (like cosine similarity between embeddings) picks up on. But here’s the problem: that document just repeats the question instead of answering it.
This is where reranking comes in. It reorders the documents based on their relevance to the query so that the most appropriate document (i.e., the second one) appears at the top of the list.
Here’s a sample code snippet to get you started:
# Import langformers
from langformers import tasks
# Create a reranker
reranker = tasks.create_reranker(model_type="cross_encoder", model_name="cross-encoder/ms-marco-MiniLM-L-6-v2")
# Define `query` and `documents`
query = "Where is the Mount Everest?"
documents = [
"Mount Everest is the highest mountain in the world.",
"Mount Everest is in Nepal.",
"Where is the Mount Everest?"
]
# Get your reranked documents
reranked_docs = reranker.rank(query=query, documents=documents)
print(reranked_docs)
- langformers.tasks.create_reranker(model_type: str, model_name: str, **kwargs)
Factory method for creating a reranker.
- Parameters:
model_type (str, required) – The type of the reranker model. Currently supported model types:
cross_encoder
model_name (str, required) – The model name from Hugging Face (e.g., “cross-encoder/ms-marco-MiniLM-L-6-v2”). Refer to this link for more models: https://huggingface.co/models?library=sentence-transformers&pipeline_tag=text-ranking
**kwargs (dict, required) – Model specific keyword arguments. Keeping this as more model_type can be added.
- Returns:
An instance of the appropriate reranker class, based on the selected model type.
If model_type is “cross_encoder”, an instance of CrossEncoder is returned.
kwargs for Cross Encoder model type:
- class langformers.rerankers.CrossEncoder(model_name: str)
Bases:
object
Ranks text pairs using a cross-encoder model from Hugging Face.
- __init__(model_name: str)
Loads the cross-encoder model and its tokenizer.
- Parameters:
model_name (str, required) – The model name from Hugging Face (e.g., “cross-encoder/ms-marco-MiniLM-L-6-v2”)
- langformers.rerankers.CrossEncoder.rank(self, query: str, documents: List[str])
Predict the ranking scores for the given pairs of texts. The function expects a query and a list of documents. Returns the documents sorted by their scores.
- Parameters:
query (str, required) – Query. E.g., “What is the capital of France?”
documents (List(str), required) – List of documents to be reranked. E.g., [“Paris is the capital of France.”, “Berlin is the capital of Germany.”]