Embeddings
Last updated
Last updated
Embedding is a technique in machine learning where words, phrases, or entire documents are represented as vectors in a high-dimensional space. This numerical representation captures semantic relationships, enabling algorithms to better understand and process the underlying meaning of the textual data.
HuggingFace provides a range of Open Source embedding models. In GenAI Stack we can implement embeddings via the Hugging Face Inference API, which does not require us to install sentence_transformers
and download models locally.
Parameters
Inference API key: HuggingFace Access Token to run the embedding model on Inference API.
Model name: One can select the model name from Massive Text Embedding Benchmark (MTEB) Leaderboard
Example Usage
To utilize HuggingFace Embeddings, you must initially register an account at huggingface.co and obtain your Access Token. With HuggingFace Inference API Embeddings, no input is needed; after entering your Access Token and model name, you can seamlessly connect this component to the Vector Store
for further use.
OpenAI Embedding is a closed source embedding model.
Parameters:
api_key: OpenAI Embeddings requires OpenAI api key.
tiktoken model name: OpenAI provides various embedding models, by default it is text-embedding-ada-002.
Example Usage
OpenAI embedding component only needs OpenAI API key that you can get from https://platform.openai.com/. TikToken model is to keep count of token, it can be None. The output for this component is Vector Store
.
Azure provides Azure AI Studio services that supports OpenAI services such as GPT3, GPT4 and Embedding models. Dimension attributes have a minimum of 2 and a maximum of 2048 dimensions per vector field.
Parameters
Azure OpenAI API key: API key that is created on AzureOpenAI Studio service.
Azure Deployment Name: Create a model deployment on Azure using text-ada version 2 embedding model
Azure Endpoint: Endpoint URL that is created on AzureOpenAI Studio service.
OpenAI Version: API Version property depends on the method you are calling in the API. This is mainly the datetime.
Example Usage
To utilize AzureOpenAI, start by creating a resource group on Azure AI Studio
. Once the resource is set up, you can easily retrieve your Endpoint URL and API key, which you can then copy and paste for integration. Within the same dashboard, navigate to model deployments and select Text-Ada
embedding to obtain your deployment name. Now your component is ready to connect to the Vector Store
.