With all the attention that these LLMs and ChatGPT have been receiving, a new principle emerges “Garbage in, garbage out.” When using GPT-4 or GPT-3.5, or any other large language model, this concept applies to two places Prompt Engineering and Fine-tuning. We should also consider when these models were trained to verify how old or new the data they provide us is.
Most businesses today sit within corporate data sources – inside a firewall or outside and not in a public domain internet. If we can leverage LLMs on this data, then new possibilities emerge.
This article will discuss building a chatbot with a custom knowledge base in 30 minutes to have your AI answer questions about your data.
We must consider how much data we have to create a custom knowledge base. In this blog, we will create an assistant using PubNub documentation. There is a lot of documentation, so we should consider using a production service such as Vectara, Pinecone, or Weaviate to manage our vector embeddings. You can use LangChain to create a local Vector Database for smaller amounts of data. LangChain also supports mapping your custom embeddings to your LLM models through a semantic or similarity search.
Vectara will use your dataset to index your data into multiple embeddings. When you pass a text input or user input Vectara will run a Semantic Search on your data and summarize the results it has found, providing an answer to your question using your custom data. This will give you the relevant information needed. Vectara supports multiple file types such as TXT, HTML, PDF, and Word Files. Gather all the documents you want to upload and add them to an individual folder.
Set up Vectara is as follows:
Sign up or log in to Vectara
Once you are on the dashboard, click
Once you give the Corpus a name and a description under the
Data Ingestion header drag and drop the files you want your LLM to know about into the
Upload Files section. After your files are uploaded, you can check your corpus ID at the top of the webpage as you will need it for the request we are about the write.
Click on your email in the top right corner and save your
Customer ID for later
API Keys and create an API key for your corpus by selecting your corpus in the drop-down menu. Save your API key for later.
The architecture will be structured as follows:
The chat application will use PubNub to send and receive messages
A PubNub Function will listen to these messages on a specific channel
A PubNub signal will be fired to let the user know when the AI is thinking and when it is done.
The message will then be forwarded to Vectara using the Vectara Rest API
The PubNub Function will then parse 1 of many results out of the response from Vectara
The response will then be published on a channel associated with your chatbot
Navigate to the admin dashboard
functions on the left-hand menu and click on the appropriate key set you would like to use
+ Create Module and enter a module name and description
Select the module you just created and click
+ Create a Function
Give the Function a name, such as
Vectara Query and select
After Publish or Fire in the drop-down menu. This function will fire after the message has been published to the relevant channel, in this case,
Set the channel name to
My Secrets and create a secret called
The code snippet for the PubNub Function is defined as follows:
To connect the PubNub function to a UI following code defined above using one of the many SDKs that PubNub provides. Publish/subscribe to the channel
pubnub-docs-ai and wait for the Vectara query to finish running after utilizing the PubNub Function above. Connecting a Typing Indicator to listen for PubNub signals on the channel
pubnub-docs-ai will allow the user to see when the PubNub Knowledge bot is thinking, adding a smoother end-user experience.
The code for connecting the PubNub function in React:
Using PubNub Functions along with any Vector Database or Vector Store is a very quick and production-ready way to create your own AI Knowledge Bot. Not only could you utilize Vectara this way but also Pinecone, Weaviate or any other production Vector Database. With PubNub Functions it is easy to host and control how the message is being sent and when it is sent to enhance your Vector Databases functionality.