Multi Modals
Last updated
Last updated
Multi-modal is a specialized custom component that utilizes advanced models, such as diffusion models or speech generators, to create images or voice outputs from a simple text prompt. Currently, we have two Multi-Modals available: OpenAITextToImage and OpenAITextToSpeech.
This component uses the Dall-e-3
model from OpenAI behind the hood. Users need to input specific parameters to generate the desired output.
Parameters
OpenAI API Key: Key used to authenticate and access the OpenAI API.
Prompt: Prompt template or ChatPrompt Prompt, that contains the prompt to be instructed for the component.
Quality of the Image: This refers to the visual quality of the image, which can be either standard or High Definition (HD).
Example Usage
The OpenAITextToImage component requires an OpenAI API Key
that you can get from https://platform.openai.com/. The Prompt can be specified using a simple PromptTemplate
. The Quality of Image
can be set either to HD or standard. This component returns an Ouput
.
This component uses the tts-1
model from OpenAI in the background. Users can generate a voice speech with a specified vocal tone, using this component.
Parameters
OpenAI API Key: Key used to authenticate and access the OpenAI API.
Text Input: The simple text prompt, which will be converted to speech.
Choose a Voice: This option lets us choose the type of vocal tone for generating the speech.
Example Usage
The OpenAITextToSpeech component requires an OpenAI API Key
that you can get from https://platform.openai.com/. The text input
is the field that gets converted to the speech. The Choose the Voice
option allows you to adjust the type of vocal tone in the output speech. This component returns an Ouput
.