LogoLogo
Home
  • Introduction
  • Quickstart
    • Starter guide
    • Core Concepts
      • Stack Type
      • Data Loader
      • Inputs/Outputs
      • Text Splitters
      • Embedding Model
      • Vector Store
      • Large Language Model
      • Memory
      • Chain
    • Testing Stack
    • Deployment
    • Knowledge Base
    • Organization and Teams
    • Secret Keys
    • Logs
  • Components
    • Inputs
    • Outputs
    • Document Loaders
    • Prompts
    • Text Splitters
    • Embeddings
    • Vector Store
    • Retrievers
    • Multi Modals
    • Agents
    • Large Language Models
    • Memories
    • Chains
    • Output Parsers
  • Customization
    • Writing Custom Components in GenAI Stack
    • Build your own custom component
    • Define parameters used for required components
  • Usecases
    • Simple QA using Open Source Large Language Models
    • Multilingual Indic Language Translation
    • Document Search and Chat
    • Chat with Multiple Documents
  • Terminologies
    • RAG - Retrieval Augmented Generation
    • Hybrid Search - Ensemble Retriever
  • REST APIs
    • GenAI Stack REST APIs
    • Chat API Reference
    • Text Generation API Reference
    • Rate Limiting and Sleep Mode
  • Troubleshooting
    • How to verify what is loaded and chunked from the loader?
  • Acknowledgements
    • Special Mentions
Powered by GitBook
On this page

Was this helpful?

  1. Components

Multi Modals

PreviousRetrieversNextAgents

Last updated 12 months ago

Was this helpful?

Multi Modals

Multi-modal is a specialized custom component that utilizes advanced models, such as diffusion models or speech generators, to create images or voice outputs from a simple text prompt. Currently, we have two Multi-Modals available: OpenAITextToImage and OpenAITextToSpeech.

OpenAITextToImage

This component uses the Dall-e-3 model from OpenAI behind the hood. Users need to input specific parameters to generate the desired output.

Parameters

  • OpenAI API Key: Key used to authenticate and access the OpenAI API.

  • Prompt: Prompt template or ChatPrompt Prompt, that contains the prompt to be instructed for the component.

  • Quality of the Image: This refers to the visual quality of the image, which can be either standard or High Definition (HD).

Example Usage

OpenAITextToSpeech

This component uses the tts-1 model from OpenAI in the background. Users can generate a voice speech with a specified vocal tone, using this component.

Parameters

  • OpenAI API Key: Key used to authenticate and access the OpenAI API.

  • Text Input: The simple text prompt, which will be converted to speech.

  • Choose a Voice: This option lets us choose the type of vocal tone for generating the speech.

Example Usage

The OpenAITextToImage component requires an OpenAI API Key that you can get from . The Prompt can be specified using a simple PromptTemplate. The Quality of Image can be set either to HD or standard. This component returns an Ouput.

The OpenAITextToSpeech component requires an OpenAI API Key that you can get from . The text input is the field that gets converted to the speech. The Choose the Voice option allows you to adjust the type of vocal tone in the output speech. This component returns an Ouput.

https://platform.openai.com/
https://platform.openai.com/
using the OpenAITextToImage to generate an image
using the OpenAITextToSpeech to generate a voice speech