Data Labeling using LLMs
Generative LLMs are highly effective for data labeling, extending beyond just conversation. Langformers offers the simplest way to define labels and conditions for labelling texts with LLMs.
To label texts, first load an LLM as a data labeller with create_labeller()
, then apply label()
.
Here’s a sample code for you to get started:
# Import langformers
from langformers import tasks
# Load an LLM as a data labeller
labeller = tasks.create_labeller(provider="huggingface", model_name="meta-llama/Meta-Llama-3-8B-Instruct", multi_label=False)
# Provide labels and conditions
conditions = {
"Positive": "The text expresses a positive sentiment.",
"Negative": "The text expresses a negative sentiment.",
"Neutral": "The text does not express any emotions."
}
# Label a text
text = "No doubt, The Shawshank Redemption is a cinematic masterpiece."
labeller.label(text, conditions)
- langformers.tasks.create_labeller(provider: str, model_name: str, multi_label: bool = False)
Factory method for loading an LLM for data labelling tasks.
- Parameters:
provider (str, required) – The name of the embedding model provider (e.g., “huggingface”). Currently supported providers:
huggingface
,ollama
.model_name (str, required) – The model name from the provider’s hub (e.g., “llama3.1:8b” if provider is “Ollama”).
multi_label (bool, default=False) – If True, allows multiple labels to be selected.
- Returns:
An instance of the appropriate labeller class, based on the selected provider.
If provider is “ollama”, an instance of OllamaDataLabeler will be returned.
If provider is “huggingface”, an instance of HuggingFaceDataLabeler will be returned.
label()
takes the following parameters:
text
(str, required): The text to be labelled.conditions
(Dict[str, str], required): A dictionary mapping labels to their descriptions.