Returning Structured Output

Обновлено 26 августа 2024

This notebook covers how to have an agent return a structured output. By default, most of the agents return a single string. It can often be useful to have an agent return something with more structure.

A good example of this is an agent tasked with doing question-answering over some sources. Let's say we want the agent to respond not only with the answer, but also a list of the sources used. We then want our output to roughly follow the schema below:

class Response(BaseModel):
    """Final response to the question being asked"""
    answer: str = Field(description = "The final answer to respond to the user")
    sources: List[int] = Field(description="List of page chunks that contain answer to the question. Only include a page chunk if it contains relevant information")

In this notebook we will go over an agent that has a retriever tool and responds in the correct format.

Create the Retriever

In this section we will do some setup work to create our retriever over some mock data containing the "State of the Union" address. Importantly, we will add a "page_chunk" tag to the metadata of each document. This is just some fake data intended to simulate a source field. In practice, this would more likely be the URL or path of a document.

%pip install -qU chromadb langchain langchain-community langchain-openai

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Load in document to retrieve over
loader = TextLoader("../../state_of_the_union.txt")
documents = loader.load()

# Split document into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

# Here is where we add in the fake source information
for i, doc in enumerate(texts):
    doc.metadata["page_chunk"] = i

# Create our retriever
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(texts, embeddings, collection_name="state-of-union")
retriever = vectorstore.as_retriever()

Create the tools

We will now create the tools we want to give to the agent. In this case, it is just one - a tool that wraps our retriever.

from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "state-of-union-retriever",
    "Query a retriever to get information about state of the union address",
)

Create response schema

Here is where we will define the response schema. In this case, we want the final answer to have two fields: one for the answer, and then another that is a list of sources

from typing import List

from langchain_core.pydantic_v1 import BaseModel, Field


class Response(BaseModel):
    """Final response to the question being asked"""

    answer: str = Field(description="The final answer to respond to the user")
    sources: List[int] = Field(
        description="List of page chunks that contain answer to the question. Only include a page chunk if it contains relevant information"
    )

Create the custom parsing logic

We now create some custom parsing logic. How this works is that we will pass the Response schema to the OpenAI LLM via their functions parameter. This is similar to how we pass tools for the agent to use.

When the Response function is called by OpenAI, we want to use that as a signal to return to the user. When any other function is called by OpenAI, we treat that as a tool invocation.

Therefore, our parsing logic has the following blocks:

If no function is called, assume that we should use the response to respond to the user, and therefore return AgentFinish
If the Response function is called, respond to the user with the inputs to that function (our structured output), and therefore return AgentFinish
If any other function is called, treat that as a tool invocation, and therefore return AgentActionMessageLog

Note that we are using AgentActionMessageLog rather than AgentAction because it lets us attach a log of messages that we can use in the future to pass back into the agent prompt.

import json

from langchain_core.agents import AgentActionMessageLog, AgentFinish

def parse(output):
    # If no function was invoked, return to user
    if "function_call" not in output.additional_kwargs:
        return AgentFinish(return_values={"output": output.content}, log=output.content)

    # Parse out the function call
    function_call = output.additional_kwargs["function_call"]
    name = function_call["name"]
    inputs = json.loads(function_call["arguments"])

    # If the Response function was invoked, return to the user with the function inputs
    if name == "Response":
        return AgentFinish(return_values=inputs, log=str(function_call))
    # Otherwise, return an agent action
    else:
        return AgentActionMessageLog(
            tool=name, tool_input=inputs, log="", message_log=[output]
        )

Create the Agent

We can now put this all together! The components of this agent are:

prompt: a simple prompt with placeholders for the user's question and then the agent_scratchpad (any intermediate steps)
tools: we can attach the tools and Response format to the LLM as functions
format scratchpad: in order to format the agent_scratchpad from intermediate steps, we will use the standard format_to_openai_function_messages. This takes intermediate steps and formats them as AIMessages and FunctionMessages.
output parser: we will use our custom parser above to parse the response of the LLM
AgentExecutor: we will use the standard AgentExecutor to run the loop of agent-tool-agent-tool...

from langchain.agents import AgentExecutor
from langchain.agents.format_scratchpad import format_to_openai_function_messages
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_openai import ChatOpenAI

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant"),
        ("user", "{input}"),
        MessagesPlaceholder(variable_name="agent_scratchpad"),
    ]
)

llm = ChatOpenAI(temperature=0)

llm_with_tools = llm.bind_functions([retriever_tool, Response])

agent = (
    {
        "input": lambda x: x["input"],
        # Format agent scratchpad from intermediate steps
        "agent_scratchpad": lambda x: format_to_openai_function_messages(
            x["intermediate_steps"]
        ),
    }
    | prompt
    | llm_with_tools
    | parse
)

agent_executor = AgentExecutor(tools=[retriever_tool], agent=agent, verbose=True)

Run the agent

We can now run the agent! Notice how it responds with a dictionary with two keys: answer and sources

agent_executor.invoke(
    {"input": "what did the president say about ketanji brown jackson"},
    return_only_outputs=True,
)

Returning Structured Output

Create the Retriever﻿

Create the tools﻿

Create response schema﻿

Create the custom parsing logic﻿

Create the Agent﻿