Embed Sentences

Using state-of-the-art embedding models for vectorizing your sentences takes just two steps with Langformers. First, create an embedder with create_embedder(), and then call embed() on it.

Here’s a sample code snippet to get you started:

# Import langformers
from langformers import tasks

# Create an embedder
embedder = tasks.create_embedder(provider="huggingface", model_name="sentence-transformers/all-MiniLM-L6-v2")

# Get your sentence embeddings
embeddings = embedder.embed(["I am hungry.", "I want to eat something."])

Tip

Which models to use? Refer to this Hugging Face page for the list of supported embedding models: https://huggingface.co/models?library=sentence-transformers.

langformers.tasks.create_embedder(provider: str, model_name: str, **kwargs)

Factory method for creating a sentence embedder.

Parameters:
  • provider (str, required) – The model provider (e.g., “huggingface”). Currently supported providers: huggingface.

  • model_name (str, required) – The model name from the provider’s hub (e.g., “sentence-transformers/all-mpnet-base-v2”).

  • **kwargs (dict, required) – Provider specific keyword arguments. Keeping this as more providers will be added.

Returns:

An instance of the appropriate embedder class, based on the selected provider.

  • If provider is “huggingface”, an instance of HuggingFaceEmbedder is returned.

Textual Similarity

To compute textual similarity between two texts, use similarity().

# Get cosine similarity
embedder.similarity(["I am hungry.", "I am starving."])
langformers.embedders.HuggingFaceEmbedder.similarity(self, texts: list[str])

Computes cosine similarity between two input texts.

Parameters:

texts (list) – A list containing exactly two text sequences.

Notes

  • The similarity score ranges from -1 (completely different) to 1 (identical).