Embed Sentences
Using state-of-the-art embedding models for vectorizing your sentences takes just two steps with Langformers.
First, create an embedder with create_embedder()
, and then call embed()
on it.
Here’s a sample code snippet to get you started:
# Import langformers
from langformers import tasks
# Create an embedder
embedder = tasks.create_embedder(provider="huggingface", model_name="sentence-transformers/all-MiniLM-L6-v2")
# Get your sentence embeddings
embeddings = embedder.embed(["I am hungry.", "I want to eat something."])
Tip
Which models to use? Refer to this Hugging Face page for the list of supported embedding models: https://huggingface.co/models?library=sentence-transformers.
- langformers.tasks.create_embedder(provider: str, model_name: str, **kwargs)
Factory method for creating a sentence embedder.
- Parameters:
provider (str, required) – The model provider (e.g., “huggingface”). Currently supported providers:
huggingface
.model_name (str, required) – The model name from the provider’s hub (e.g., “sentence-transformers/all-mpnet-base-v2”).
**kwargs (dict, required) – Provider specific keyword arguments. Keeping this as more providers will be added.
- Returns:
An instance of the appropriate embedder class, based on the selected provider.
If provider is “huggingface”, an instance of HuggingFaceEmbedder is returned.
- langformers.embedders.HuggingFaceEmbedder.embed(self, texts: list[str])
Generates normalized sentence embeddings for input texts.
- Parameters:
texts (list[str]) – A list of text sequences to be embedded.
Notes
Uses mean pooling over token embeddings to obtain sentence embeddings.
Applies L2 normalization to the embeddings.
Textual Similarity
To compute textual similarity between two texts, use similarity()
.
# Get cosine similarity
embedder.similarity(["I am hungry.", "I am starving."])
- langformers.embedders.HuggingFaceEmbedder.similarity(self, texts: list[str])
Computes cosine similarity between two input texts.
- Parameters:
texts (list) – A list containing exactly two text sequences.
Notes
The similarity score ranges from -1 (completely different) to 1 (identical).