LogoLogo
Home
  • Introduction
  • Quickstart
    • Starter guide
    • Core Concepts
      • Stack Type
      • Data Loader
      • Inputs/Outputs
      • Text Splitters
      • Embedding Model
      • Vector Store
      • Large Language Model
      • Memory
      • Chain
    • Testing Stack
    • Deployment
    • Knowledge Base
    • Organization and Teams
    • Secret Keys
    • Logs
  • Components
    • Inputs
    • Outputs
    • Document Loaders
    • Prompts
    • Text Splitters
    • Embeddings
    • Vector Store
    • Retrievers
    • Multi Modals
    • Agents
    • Large Language Models
    • Memories
    • Chains
    • Output Parsers
  • Customization
    • Writing Custom Components in GenAI Stack
    • Build your own custom component
    • Define parameters used for required components
  • Usecases
    • Simple QA using Open Source Large Language Models
    • Multilingual Indic Language Translation
    • Document Search and Chat
    • Chat with Multiple Documents
  • Terminologies
    • RAG - Retrieval Augmented Generation
    • Hybrid Search - Ensemble Retriever
  • REST APIs
    • GenAI Stack REST APIs
    • Chat API Reference
    • Text Generation API Reference
    • Rate Limiting and Sleep Mode
  • Troubleshooting
    • How to verify what is loaded and chunked from the loader?
  • Acknowledgements
    • Special Mentions
Powered by GitBook
On this page

Was this helpful?

  1. Quickstart
  2. Core Concepts

Text Splitters

PreviousInputs/OutputsNextEmbedding Model

Last updated 1 year ago

Was this helpful?

After loading documents into your system, the next step is often to make them easier for Large Language Models (LLMs) to handle. This usually means breaking down long texts into smaller pieces. This process is called chunking or text splitting. It's like cutting a big cake into smaller slices so everyone can have a piece. This is important because LLMs can only read and understand a certain amount of text at once.

What is Text Splitting?

Text splitting is a way to prepare large texts for LLMs. Since these models can't deal with too much text all at once, splitting the text into smaller parts helps. Think of it as making the text fit into the model's reading limit.

GenAI Stack Text Splitters

GenAI Stack has tools to help split texts into smaller chunks. These tools are designed to work in different ways, depending on what you need. Here are a couple of them:

  • Recursive Character Text Splitter: This tool carefully breaks down the text into smaller pieces, making sure that the pieces make sense on their own. It's like making sure each slice of the cake has a bit of everything.

  • Character Text Splitter: This simpler tool cuts the text based on a set number of characters. It's quick and easy but might cut in the middle of a sentence.

Why Use Text Splitters?

Using text splitters makes sure that LLMs can handle and understand the text better. It's like reading a book one chapter at a time instead of trying to read the whole book at once. By choosing the right tool, you can make sure the text is easy for the model to work with, whether you're summarizing a document or analyzing it in detail. This way, your application can handle large texts without any trouble.

An example implementation:

Loading a document and chunking it

Conclusion:

There are different types of TextSplitting we are supporting. To know more about them and their parameters refer the component documentation .

here