Tagging
Tagging means labeling a document with classes such as:
- sentiment
- language
- style (formal, informal etc.)
- covered topics
- political tendency
Overview
Tagging has a few components:
function
: Like extraction, tagging uses functions to specify how the model should tag a documentschema
: defines how we want to tag the document
Quickstart
Let's see a very straightforward example of how we can use OpenAI functions for tagging in LangChain.
%pip install --upgrade --quiet gigachain langchain-openai
# Set env var OPENAI_API_KEY or load from a .env file:
# import dotenv
# dotenv.load_dotenv()
from langchain.chains import create_tagging_chain, create_tagging_chain_pydantic
from langchain_openai import ChatOpenAI
We specify a few properties with their expected type in our schema.
# Schema
schema = {
"properties": {
"sentiment": {"type": "string"},
"aggressiveness": {"type": "integer"},
"language": {"type": "string"},
}
}
# LLM
llm = ChatOpenAI(temperature=0, model="gpt-3.5-turbo-0613")
chain = create_tagging_chain(schema, llm)
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
chain.run(inp)
{'sentiment': 'positive', 'language': 'Spanish'}
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
chain.run(inp)
{'sentiment': 'enojado', 'aggressiveness': 1, 'language': 'es'}
As we can see in the examples, it correctly interprets what we want.
The results vary so that we get, for example, sentiments in different languages ('positive', 'enojado' etc.).
We will see how to control these results in the next section.
Finer control
Careful schema definition gives us more control over the model's output.
Specifically, we can define:
- possible values for each property
- description to make sure that the model understands the property
- required properties to be returned
Here is an example of how we can use
_enum_
,_description_
, and_required_
to control for each of the previously mentioned aspects:
schema = {
"properties": {
"aggressiveness": {
"type": "integer",
"enum": [1, 2, 3, 4, 5],
"description": "describes how aggressive the statement is, the higher the number the more aggressive",
},
"language": {
"type": "string",
"enum": ["spanish", "english", "french", "german", "italian"],
},
},
"required": ["language", "sentiment", "aggressiveness"],
}
chain = create_tagging_chain(schema, llm)
Now the answers are much better!
inp = "Estoy increiblemente contento de haberte conocido! Creo que seremos muy buenos amigos!"
chain.run(inp)
{'aggressiveness': 0, 'language': 'spanish'}
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
chain.run(inp)
{'aggressiveness': 5, 'language': 'spanish'}
inp = "Weather is ok here, I can go outside without much more than a coat"
chain.run(inp)
{'aggressiveness': 0, 'language': 'english'}
The LangSmith trace lets us peek under the hood:
- As with extraction, we call the
information_extraction
function here on the input string. - This OpenAI function extraction information based upon the provided schema.
Pydantic
We can also use a Pydantic schema to specify the required properties and types.
We can also send other arguments, such as enum
or description
, to each field.
This lets us specify our schema in the same manner that we would a new class or function in Python with purely Pythonic types.
from langchain_core.pydantic_v1 import BaseModel, Field
class Tags(BaseModel):
sentiment: str = Field(..., enum=["happy", "neutral", "sad"])
aggressiveness: int = Field(
...,
description="describes how aggressive the statement is, the higher the number the more aggressive",
enum=[1, 2, 3, 4, 5],
)
language: str = Field(
..., enum=["spanish", "english", "french", "german", "italian"]
)
chain = create_tagging_chain_pydantic(Tags, llm)
inp = "Estoy muy enojado con vos! Te voy a dar tu merecido!"
res = chain.run(inp)
res
Tags(sentiment='sad', aggressiveness=5, language='spanish')
Going deeper
- You can use the metadata tagger document transformer to extract metadata from a LangChain
Document
. - This covers the same basic functionality as the tagging chain, only applied to a LangChain
Document
.