LogoLogo
Home
  • Introduction
  • Quickstart
    • Starter guide
    • Core Concepts
      • Stack Type
      • Data Loader
      • Inputs/Outputs
      • Text Splitters
      • Embedding Model
      • Vector Store
      • Large Language Model
      • Memory
      • Chain
    • Testing Stack
    • Deployment
    • Knowledge Base
    • Organization and Teams
    • Secret Keys
    • Logs
  • Components
    • Inputs
    • Outputs
    • Document Loaders
    • Prompts
    • Text Splitters
    • Embeddings
    • Vector Store
    • Retrievers
    • Multi Modals
    • Agents
    • Large Language Models
    • Memories
    • Chains
    • Output Parsers
  • Customization
    • Writing Custom Components in GenAI Stack
    • Build your own custom component
    • Define parameters used for required components
  • Usecases
    • Simple QA using Open Source Large Language Models
    • Multilingual Indic Language Translation
    • Document Search and Chat
    • Chat with Multiple Documents
  • Terminologies
    • RAG - Retrieval Augmented Generation
    • Hybrid Search - Ensemble Retriever
  • REST APIs
    • GenAI Stack REST APIs
    • Chat API Reference
    • Text Generation API Reference
    • Rate Limiting and Sleep Mode
  • Troubleshooting
    • How to verify what is loaded and chunked from the loader?
  • Acknowledgements
    • Special Mentions
Powered by GitBook
On this page
  • HuggingFace Inference API Embeddings
  • OpenAI Embedding
  • Azure OpenAI Embeddings

Was this helpful?

  1. Components

Embeddings

PreviousText SplittersNextVector Store

Last updated 12 months ago

Was this helpful?

Embedding is a technique in machine learning where words, phrases, or entire documents are represented as vectors in a high-dimensional space. This numerical representation captures semantic relationships, enabling algorithms to better understand and process the underlying meaning of the textual data.

HuggingFace Inference API Embeddings

HuggingFace provides a range of Open Source embedding models. In GenAI Stack we can implement embeddings via the Hugging Face Inference API, which does not require us to install sentence_transformers and download models locally.​

Parameters

  • Inference API key: HuggingFace Access Token to run the embedding model on Inference API.

  • Model name: One can select the model name from

Example Usage


OpenAI Embedding

OpenAI Embedding is a closed source embedding model.

Parameters:

  • api_key: OpenAI Embeddings requires OpenAI api key.

  • tiktoken model name: OpenAI provides various embedding models, by default it is text-embedding-ada-002.

Example Usage


Azure OpenAI Embeddings

Azure provides Azure AI Studio services that supports OpenAI services such as GPT3, GPT4 and Embedding models. Dimension attributes have a minimum of 2 and a maximum of 2048 dimensions per vector field.

Parameters

  • Azure OpenAI API key: API key that is created on AzureOpenAI Studio service.

  • Azure Deployment Name: Create a model deployment on Azure using text-ada version 2 embedding model

  • Azure Endpoint: Endpoint URL that is created on AzureOpenAI Studio service.

  • OpenAI Version: API Version property depends on the method you are calling in the API. This is mainly the datetime.

Example Usage

To utilize AzureOpenAI, start by creating a resource group on Azure AI Studio. Once the resource is set up, you can easily retrieve your Endpoint URL and API key, which you can then copy and paste for integration. Within the same dashboard, navigate to model deployments and select Text-Ada embedding to obtain your deployment name. Now your component is ready to connect to the Vector Store.


To utilize HuggingFace Embeddings, you must initially register an account at and obtain your Access Token. With HuggingFace Inference API Embeddings, no input is needed; after entering your Access Token and model name, you can seamlessly connect this component to the Vector Store for further use.

OpenAI embedding component only needs OpenAI API key that you can get from . TikToken model is to keep count of token, it can be None. The output for this component is Vector Store.

huggingface.co
https://platform.openai.com/
Massive Text Embedding Benchmark (MTEB) Leaderboard