Writing Custom Components in GenAI Stack
Introduction
This documentation provides guidelines and examples to help you write custom components for the GenAI stack. Custom components allow you to extend the functionality of your application by integrating various external services and custom logic. This guide covers the following topics:
How to write a custom component.
Understanding the typing system.
Available types.
Example of a real custom component.
Writing a Custom Component
A custom component in the GenAI stack is a class that inherits from CustomComponent
and defines its configuration and behavior through specific methods. The key methods are build_config
and build
.
Structure of a Custom Component
Here is a basic structure of a custom component:
Explanation
display_name
: The name of the component as it will appear in the UI.description
: A brief description of the component's functionality.version
: The version of the component.build_config
: This method defines the configuration parameters for the component.build
: This method contains the custom logic to process the inputs and produce the output.
Understanding the Typing System
The GenAI stack uses specific types to ensure type safety and clarity in component development. Input and Output types MUST follow this typing system in order for you to connect the components. Any mismatch in types will lead to the components being unconnectable. Commonly used types include:
str: String data type for textual information.
Document: Represents a document object containing content (e.g., text) and optional metadata (e.g., source URL). This serves as the primary data transfer medium between components.
VectorStore: Refers to a storage mechanism for vectors, which are numerical representations of text data used in retrieval tasks.
BaseRetriever: Represents a retriever component that retrieves relevant documents from a vector store based on a query.
Embeddings: Denotes embeddings, a technique for converting text into vectors, facilitating vector-based retrieval and analysis.
PromptTemplate: A pre-defined template for prompting large language models (LLMs) with specific instructions or formats.
Chain: Represents a sequence of connected components that process data in a pipeline fashion.
BaseChatMemory: Represents a base class for components that manage chat state and context within a conversational setting.
BaseLLM: Represents a base class for large language models (LLMs) used for tasks like text generation and translation.
BaseLoader: Represents a base class for components that load data from various sources (e.g., files, databases).
BaseMemory: Represents a base class for components that store and retrieve information relevant to a particular task or session.
BaseOutputParser: Represents a base class for components that process and structure the output generated by other components.
TextSplitter: Represents a component that splits text into smaller chunks for further processing.
Maintaining Type Consistency
When developing custom components, it's essential to:
Declare Expected Input Types: In your
build
method, specify the expected input types for each configuration parameter using appropriate typing annotations.Define Return Types: Clearly indicate the output type(s) returned by your component's
build
method. This ensures other components can correctly handle the generated data.Adhere to Established Types: Utilize the built-in types provided by the GenAI Stack to maintain consistency and avoid potential compatibility issues.
Example: GitHub File Loader Component
This component loads a single file from a GitHub repository.
Explanation
The
GithubFileLoaderComponent
class inherits fromCustomComponent
.build_config
method defines the configuration parametergithub_file_url
.github_file_url
: The URL of the file in the GitHub repository. It is required and accepts input as a string.
build
method:Extracts the repository information (owner, repository name, branch, and file path) from the provided GitHub URL using regular expressions.
Constructs the raw URL to fetch the file content directly from GitHub.
Sends a GET request to the raw URL and checks if the request is successful.
If successful, retrieves the file content and creates a
Document
object with the content and metadata.Returns a list of
Document
objects containing the file content
Last updated