Embeddings

Embedding is a technique in machine learning where words, phrases, or entire documents are represented as vectors in a high-dimensional space. This numerical representation captures semantic relationships, enabling algorithms to better understand and process the underlying meaning of the textual data.

HuggingFace Inference API Embeddings

HuggingFace provides a range of Open Source embedding models. In GenAI Stack we can implement embeddings via the Hugging Face Inference API, which does not require us to install sentence_transformers and download models locally.

Parameters

Inference API key: HuggingFace Access Token to run the embedding model on Inference API.
Model name: One can select the model name from Massive Text Embedding Benchmark (MTEB) Leaderboard

Example Usage

To utilize HuggingFace Embeddings, you must initially register an account at huggingface.co and obtain your Access Token. With HuggingFace Inference API Embeddings, no input is needed; after entering your Access Token and model name, you can seamlessly connect this component to the Vector Store for further use.

OpenAI Embedding

OpenAI Embedding is a closed source embedding model.

Parameters:

api_key: OpenAI Embeddings requires OpenAI api key.
tiktoken model name: OpenAI provides various embedding models, by default it is text-embedding-ada-002.

Example Usage

OpenAI embedding component only needs OpenAI API key that you can get from https://platform.openai.com/. TikToken model is to keep count of token, it can be None. The output for this component is Vector Store.

Azure OpenAI Embeddings

Azure provides Azure AI Studio services that supports OpenAI services such as GPT3, GPT4 and Embedding models. Dimension attributes have a minimum of 2 and a maximum of 2048 dimensions per vector field.

Parameters

Azure OpenAI API key: API key that is created on AzureOpenAI Studio service.
Azure Deployment Name: Create a model deployment on Azure using text-ada version 2 embedding model
Azure Endpoint: Endpoint URL that is created on AzureOpenAI Studio service.
OpenAI Version: API Version property depends on the method you are calling in the API. This is mainly the datetime.

Example Usage

To utilize AzureOpenAI, start by creating a resource group on Azure AI Studio. Once the resource is set up, you can easily retrieve your Endpoint URL and API key, which you can then copy and paste for integration. Within the same dashboard, navigate to model deployments and select Text-Ada embedding to obtain your deployment name. Now your component is ready to connect to the Vector Store.

PreviousText Splitters NextVector Store

Last updated 1 year ago

Was this helpful?