Text Splitters
Last updated
Last updated
After loading documents into your system, the next step is often to make them easier for Large Language Models (LLMs) to handle. This usually means breaking down long texts into smaller pieces. This process is called chunking or text splitting. It's like cutting a big cake into smaller slices so everyone can have a piece. This is important because LLMs can only read and understand a certain amount of text at once.
Text splitting is a way to prepare large texts for LLMs. Since these models can't deal with too much text all at once, splitting the text into smaller parts helps. Think of it as making the text fit into the model's reading limit.
GenAI Stack has tools to help split texts into smaller chunks. These tools are designed to work in different ways, depending on what you need. Here are a couple of them:
Recursive Character Text Splitter: This tool carefully breaks down the text into smaller pieces, making sure that the pieces make sense on their own. It's like making sure each slice of the cake has a bit of everything.
Character Text Splitter: This simpler tool cuts the text based on a set number of characters. It's quick and easy but might cut in the middle of a sentence.
Using text splitters makes sure that LLMs can handle and understand the text better. It's like reading a book one chapter at a time instead of trying to read the whole book at once. By choosing the right tool, you can make sure the text is easy for the model to work with, whether you're summarizing a document or analyzing it in detail. This way, your application can handle large texts without any trouble.
An example implementation:
There are different types of TextSplitting we are supporting. To know more about them and their parameters refer the component documentation here.