ym88659208ym87991671
Semantic Chunking | Документация для разработчиков

Semantic Chunking

Обновлено 26 августа 2024

Splits the text based on semantic similarity.

Taken from Greg Kamradt's wonderful notebook: https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/5_Levels_Of_Text_Splitting.ipynb

All credit to him.

At a high level, this splits into sentences, then groups into groups of 3 sentences, and then merges one that are similar in the embedding space.

Install Dependencies

!pip install --quiet langchain_experimental langchain_openai

Load Example Data

# This is a long document we can split up.
with open("../../state_of_the_union.txt") as f:
state_of_the_union = f.read()

Create Text Splitter

from langchain_experimental.text_splitter import SemanticChunker
from langchain_openai.embeddings import OpenAIEmbeddings
text_splitter = SemanticChunker(OpenAIEmbeddings())

Split Text

docs = text_splitter.create_documents([state_of_the_union])
print(docs[0].page_content)
ПАО Сбербанк использует cookie для персонализации сервисов и удобства пользователей.
Вы можете запретить сохранение cookie в настройках своего браузера.