With all the attention that these LLMs and ChatGPT have been receiving, a new principle emerges “Garbage in, garbage out.” When using GPT-4 or GPT-3.5, or any other large language model, this concept applies to two places Prompt Engineering and Fine-tuning. We should also consider when these models were trained to verify how old or new the data they provide us is.
Most businesses today sit within corporate data sources – inside a firewall or outside and not in a public domain internet. If we can leverage LLMs on this data, then new possibilities emerge.
This article will discuss building a chatbot with a custom knowledge base in 30 minutes to have your AI answer questions about your data.
We must consider how much data we have to create a custom knowledge base. In this blog, we will create an assistant using PubNub documentation. There is a lot of documentation, so we should consider using a production service such as Vectara, Pinecone, or Weaviate to manage our vector embeddings. You can use LangChain to create a local Vector Database for smaller amounts of data. LangChain also supports mapping your custom embeddings to your LLM models through a semantic or similarity search.
For this blog, we will go through setting up a Vector Database on Vectara, as it provides a simple drag-and-drop solution for all of your company data. The best way to interact with Vectara is to utilize PubNub Functions, which provides a serverless JavaScript container that runs whenever a pre-defined event occurs. You will have customizability over when the function runs and gets fired, being able to adjust how your AI Knowledge Bot operates.
Vectara will use your dataset to index your data into multiple embeddings. When you pass a text input or user input Vectara will run a Semantic Search on your data and summarize the results it has found, providing an answer to your question using your custom data. This will give you the relevant information needed. Vectara supports multiple file types such as TXT, HTML, PDF, and Word Files. Gather all the documents you want to upload and add them to an individual folder.
Set up Vectara is as follows:
Sign up or log in to Vectara
Once you are on the dashboard, click Create corpus
Once you give the Corpus a name and a description under the Data Ingestion
header drag and drop the files you want your LLM to know about into the Upload Files
section. After your files are uploaded, you can check your corpus ID at the top of the webpage as you will need it for the request we are about the write.
Click on your email in the top right corner and save your Customer ID
for later
Select API Keys
and create an API key for your corpus by selecting your corpus in the drop-down menu. Save your API key for later.
The architecture will be structured as follows:
The chat application will use PubNub to send and receive messages
A PubNub Function will listen to these messages on a specific channel
A PubNub signal will be fired to let the user know when the AI is thinking and when it is done.
The message will then be forwarded to Vectara using the Vectara Rest API
The PubNub Function will then parse 1 of many results out of the response from Vectara
The response will then be published on a channel associated with your chatbot
Navigate to the admin dashboard
Select functions
on the left-hand menu and click on the appropriate key set you would like to use
Select + Create Module
and enter a module name and description
Select the module you just created and click + Create a Function
Give the Function a name, such as Vectara Query
and select After Publish or Fire
in the drop-down menu. This function will fire after the message has been published to the relevant channel, in this case, docs-pubnub-ai
Set the channel name to docs-pubnub-ai
Click on My Secrets
and create a secret called VECTARA_API_KEY
and CUSTOMER_ID
The code snippet for the PubNub Function is defined as follows:
To connect the PubNub function to a UI following code defined above using one of the many SDKs that PubNub provides. Publish/subscribe to the channel pubnub-docs-ai
and wait for the Vectara query to finish running after utilizing the PubNub Function above. Connecting a Typing Indicator to listen for PubNub signals on the channel pubnub-docs-ai
will allow the user to see when the PubNub Knowledge bot is thinking, adding a smoother end-user experience.
The code for connecting the PubNub function in React:
Using PubNub Functions along with any Vector Database or Vector Store is a very quick and production-ready way to create your own AI Knowledge Bot. Not only could you utilize Vectara this way but also Pinecone, Weaviate or any other production Vector Database. With PubNub Functions it is easy to host and control how the message is being sent and when it is sent to enhance your Vector Databases functionality.
Sign up for our admin dashboard to start configuring your PubNub keyset. Also, check the number of tutorials and blogs we have for your specific use case.