> For the complete documentation index, see [llms.txt](https://docs.aiplanet.com/genai-stack-1/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.aiplanet.com/genai-stack-1/components/multi-modals.md).

# Multi Modals

### Multi Modals

Multi-modal is a specialized custom component that utilizes advanced models, such as diffusion models or speech generators, to create images or voice outputs from a simple text prompt. Currently, we have two Multi-Modals available: OpenAITextToImage and OpenAITextToSpeech.

#### OpenAITextToImage

This component uses the `Dall-e-3` model from OpenAI behind the hood. Users need to input specific parameters to generate the desired output.&#x20;

**Parameters**

* **OpenAI API Key:** Key used to authenticate and access the OpenAI API.
* **Prompt:** Prompt template or ChatPrompt Prompt, that contains the prompt to be instructed for the component.
* **Quality of the Image:** This refers to the visual quality of the image, which can be either standard or High Definition (HD).

**Example Usage**

<figure><img src="/files/JHni3OYJ6l5Znjo1VA37" alt=""><figcaption><p>using the OpenAITextToImage to generate an image</p></figcaption></figure>

The OpenAITextToImage component requires an `OpenAI API Key` that you can get from <https://platform.openai.com/>. The Prompt can be specified using a simple `PromptTemplate`. The `Quality of Image` can be set either to HD or standard. This component returns an `Ouput`.

#### OpenAITextToSpeech

This component uses the `tts-1` model from OpenAI in the background. Users can generate a voice speech with a specified vocal tone, using this component.

**Parameters**

* **OpenAI API Key:** Key used to authenticate and access the OpenAI API.
* **Text Input:** The simple text prompt, which will be converted to speech.
* **Choose a Voice:** This option lets us choose the type of vocal tone for generating the speech.&#x20;

**Example Usage**

<figure><img src="/files/1V7S2le9NBoiIHFiIT4i" alt=""><figcaption><p>using the OpenAITextToSpeech to generate a voice speech</p></figcaption></figure>

The OpenAITextToSpeech component requires an `OpenAI API Key` that you can get from <https://platform.openai.com/>. The `text input` is the field that gets converted to the speech. The `Choose the Voice` option allows you to adjust the type of vocal tone in the output speech. This component returns an `Ouput`.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aiplanet.com/genai-stack-1/components/multi-modals.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.