Large Language Models
Last updated
Last updated
An LLM, or Large Language Model, is a fundamental element of GenAI Stack as the generator RAG model. This offers a standardized interface to seamlessly engage with various LLMs from providers like OpenAI, Anthropic, Cohere, and HuggingFace.
Note : GenAI Stack does not host its own LLMs but rather provides a universal interface, enabling interaction with diverse LLMs across the platform, particularly within chains and agents. The LLM class in GenAI Stack serves as a standardized interface for multiple LLM providers, ensuring consistency in handling input strings and generating corresponding text outputs.
ChatOpenAI is a chat model provided by OpenAI which is trained on instructions dataset in a large corpus.
Parameters
OpenAI API Key : Key used to authenticate and access the OpenAI API.
Max Tokens : The output sequence length response from the model.
Model Name : Defines the OpenAI chat model to be used in eg: GPT3 and GPT4 series.
OpenAI API Base : Used to specify the base URL for the OpenAI API. It is typically set to the API endpoint provided by the OpenAI service.
Temperature : It can be used to control the randomness or creativity in responses.
Example Usage
ChatOpenAI supports both GPT3.5 and GPT4 models. You can access it after paying for the credits at platform.openai.com.
Azure OpenAI Service provides REST API access to OpenAI’s powerful language models including the GPT-4, GPT-3.5-Turbo, and Embeddings model series.
Parameters
AzureChatOpenAI API Base : Azure Cloud endpoint base URL
AzureChatOpenAI API Key: Azure api key for AzureChatOpenAI service
AzureChatOpenAI API Type : by default enter azure
Deployment Name : Enter the the deployment name that is created on Model deployments on Azure
Model Name : AzureChatOpenAI enables the access to GPT4 models.
API Version : API Version property depends on the method you are calling in the API. This is mainly the datetime.
Max Tokens : The maximum sequence length for the model response.
Temperature : It can be used to control the randomness or creativity in responses.
Example Usage
You can obtain your Azure OpenAI credentials from Azure AI Studio services. The instructions are identical to those mentioned for Azure Embeddings.
The Hugging Face Hub is a platform with over 350k models, 75k datasets, and 150k demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together.
Parameters
Repo Id : Model name from the HuggingFace Hub – defaults to gpt2.
HuggingFacehub API Token : HuggingFace Access Token to run the model on Inference API.
Model Kwargs : Few parameters that need to be configured to get better results from the model. Such max_new_tokens.
Example Usage
To utilize HuggingFace Hub, you must initially register an account at huggingface.co and obtain your Access Token. With HuggingFace Hub you need not have to load the model. Utilizing 7B parameter models with HuggingFace Hub is free of charge.
The HuggingFace Inference API relies on the Inference Endpoint provided by HuggingFace.
Parameters
Endpoint URL: Deployed GPU instances cloud URL from HuggingFace Inference endpoint.
API Token: HuggingFace Access Token to run the inference.
Model Keyword Arguments: This is model_kwargs that contains max_new_tokens, top_p, top_k, temperature and other arguments that is used in Transformers pipeline.
Task: Currently inference endpoint support text2text-generation", "text-generation", "summarization". Source: https://github.com/langchain-ai/langchain/blob/370becdfc2dea35eab6b56244872001116d24f0b/langchain/llms/huggingface_endpoint.py
Example Usage
It's important to note that while the large language models deployed on HuggingFace GPU instances are based on open source models, they are not free to use; access requires payment. HuggingFace Endpoint supports GCP, Azure and AWS cloud services.
Cohere, a Canadian startup, offers natural language processing models aimed at enhancing human-machine interactions for businesses. It is renowned for its expertise in semantic search and embedding models.
Parameters
Cohere API Key: You need to create new API from Cohere dashboard.
Max Tokens: The output sequence length response from the model.
Temperature: It can be used to control the randomness or creativity in responses.
Example Usage
Cohere is also a close source Large Language model and requires API key to do the inference. It provides free usage upto certain token limits.
OpenAI is a Large Language Model based on GPT 3.5 model. ChatOpenAI is instruct based, whereas OpenAI model is generic LLM. OpenAI only supports GPT 3 based models and doesn't support GPT-4 model as of now.
Parameters
Max Tokens : The output sequence length response from the model.
OpenAI API Key : Key used to authenticate and access the OpenAI API.
Model Name : Defines the OpenAI chat model to be used in eg: GPT3 and GPT4 series.
OpenAI API Base : Used to specify the base URL for the OpenAI API. It is typically set to the API endpoint provided by the OpenAI service.
Temperature : It can be used to control the randomness or creativity in responses.
Example Usage
The arguments are same as ChatOpenAI, but OpenAI LLM doesn't support GPT4.
Vertex AI empowers ML to take their project to deployment. Here in this context Vertex AI provides Model Garden and Generative Studio that supports base foundational model such as PaLM LLM.
Parameters
Credentials: Once you create a Google Cloud Platform(GCP) account, download the creds json file and upload the file as your credentials.
Model Name: Vertex AI contains PaLM as LLM. You need pick a base foundation model provided under PaLM such as text-bison
(default), or text-unicorn
.
Location: Region on which the API calls needs to take place, us-central1 by default.
Max Retries: Request iteration to the server to make API call.
Metadata: To define the source, can be None.
Project: Add your GCP project to make API call.
Temperature: It can be used to control the randomness or creativity in responses.
Streaming (bool): If enabled, Vertex AI streams the output response.
Amazon Bedrock is a comprehensive managed service providing access to a variety of high-performance foundation models (FMs) sourced from prominent AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. These models are accessible through a unified API, accompanied by a wide range of functionalities essential for constructing generative AI applications, all while ensuring security, privacy, and responsible AI practices.
Parameters
Credentials Profile Name: Fetch your serverless profile name from AWS, e.g., bedrock-admin
Model Id: Amazon Bedrock contains model support from A121 Labs, Anthropic, Cohere and Meta, using the dropdown option, you can pick a releavant model of your choice.