Writing Custom Components in GenAI Stack

Introduction

This documentation provides guidelines and examples to help you write custom components for the GenAI stack. Custom components allow you to extend the functionality of your application by integrating various external services and custom logic. This guide covers the following topics:

  1. How to write a custom component.

  2. Understanding the typing system.

  3. Available types.

  4. Example of a real custom component.

Writing a Custom Component

A custom component in the GenAI stack is a class that inherits from CustomComponent and defines its configuration and behavior through specific methods. The key methods are build_config and build.

Structure of a Custom Component

Here is a basic structure of a custom component:

from genflow.interface.custom.custom_component import CustomComponent
from langchain.schema import Document
from typing import List

class ExampleComponent(CustomComponent):
    display_name: str = "ExampleComponent"
    description: str = "This is an example custom component."
    version: str = "1.0"

    def build_config(self):
        return {
            "param1": {
                "display_name": "Parameter 1",
                "required": True,
                "input_types": ["Input"],
            },
            "param2": {
                "display_name": "Parameter 2",
                "required": False,
                "input_types": ["Input"],
            },
            "code": {"show": False},
        }

    def build(self, param1: str, param2: str) -> List[Document]:
        # Custom logic to build and return documents
        documents = [Document(page_content=f"Param1: {param1}, Param2: {param2}")]
        return documents

Explanation

  • display_name: The name of the component as it will appear in the UI.

  • description: A brief description of the component's functionality.

  • version: The version of the component.

  • build_config: This method defines the configuration parameters for the component.

  • build: This method contains the custom logic to process the inputs and produce the output.

Understanding the Typing System

The GenAI stack uses specific types to ensure type safety and clarity in component development. Input and Output types MUST follow this typing system in order for you to connect the components. Any mismatch in types will lead to the components being unconnectable. Commonly used types include:

  • str: String data type for textual information.

  • Document: Represents a document object containing content (e.g., text) and optional metadata (e.g., source URL). This serves as the primary data transfer medium between components.

  • VectorStore: Refers to a storage mechanism for vectors, which are numerical representations of text data used in retrieval tasks.

  • BaseRetriever: Represents a retriever component that retrieves relevant documents from a vector store based on a query.

  • Embeddings: Denotes embeddings, a technique for converting text into vectors, facilitating vector-based retrieval and analysis.

  • PromptTemplate: A pre-defined template for prompting large language models (LLMs) with specific instructions or formats.

  • Chain: Represents a sequence of connected components that process data in a pipeline fashion.

  • BaseChatMemory: Represents a base class for components that manage chat state and context within a conversational setting.

  • BaseLLM: Represents a base class for large language models (LLMs) used for tasks like text generation and translation.

  • BaseLoader: Represents a base class for components that load data from various sources (e.g., files, databases).

  • BaseMemory: Represents a base class for components that store and retrieve information relevant to a particular task or session.

  • BaseOutputParser: Represents a base class for components that process and structure the output generated by other components.

  • TextSplitter: Represents a component that splits text into smaller chunks for further processing.

Maintaining Type Consistency

When developing custom components, it's essential to:

  • Declare Expected Input Types: In your build method, specify the expected input types for each configuration parameter using appropriate typing annotations.

  • Define Return Types: Clearly indicate the output type(s) returned by your component's build method. This ensures other components can correctly handle the generated data.

  • Adhere to Established Types: Utilize the built-in types provided by the GenAI Stack to maintain consistency and avoid potential compatibility issues.

Example: GitHub File Loader Component

This component loads a single file from a GitHub repository.

typing import List
from genflow.interface.custom.custom_component import CustomComponent
from langchain.schema import Document
import requests
import re
from genflow.utils.util import build_loader_repr_from_documents

class GithubFileLoaderComponent(CustomComponent):
    display_name: str = "GitHub File Loader"
    description: str = "Loads a single file from a GitHub repository."
    documentation: str = "https://docs.aiplanet.com/components/document-loaders#gitloader"
    version: str = "1.0"
    beta = False

    def build_config(self):
        return {
            "github_file_url": {
                "display_name": "GitHub File URL",
                "info": "The URL of the file in the GitHub repository",
                "required": True,
                "input_types": ["Input"]
            },
            "code": {"show": True},
        }

    def build(self, github_file_url: str) -> List[Document]:
        # Extract repository information from the URL
        match = re.match(r"https://github.com/(.+)/(.+)/blob/(.+)/(.+)", github_file_url)
        if match:
            repo_owner, repo_name, branch, file_path = match.groups()
            raw_url = f"https://raw.githubusercontent.com/{repo_owner}/{repo_name}/{branch}/{file_path}"

            # Send a GET request to the raw URL
            response = requests.get(raw_url)

            # Check if the request was successful
            if response.status_code == 200:
                # Get the file content
                file_content = response.text
                docs = [Document(page_content=file_content, metadata={"file_path": github_file_url})]
                self.repr_value = build_loader_repr_from_documents(docs)
                return docs
            else:
                raise Exception(f"Failed to fetch file: {response.status_code}")
        else:
            raise ValueError("Invalid GitHub file URL format")

Explanation

  • The GithubFileLoaderComponent class inherits from CustomComponent.

  • build_config method defines the configuration parameter github_file_url.

    • github_file_url: The URL of the file in the GitHub repository. It is required and accepts input as a string.

  • build method:

    • Extracts the repository information (owner, repository name, branch, and file path) from the provided GitHub URL using regular expressions.

    • Constructs the raw URL to fetch the file content directly from GitHub.

    • Sends a GET request to the raw URL and checks if the request is successful.

    • If successful, retrieves the file content and creates a Document object with the content and metadata.

    • Returns a list of Document objects containing the file content

Last updated