Tutorial: Creating a Q&A chat bot with Semantic Search

Creating a Q&A chat bot with Semantic Search

This tutorial demonstrates how to make an application that allows GPT-3 to answer questions that may fall outside its knowledge area (like user documentation) by finding relevant sections using OpenAI's Semantic Search API with a supplied document and then the completion API to help generate a response.

This is an interactive tutoral. You can add your own questions, documents and prompts to see how they function.

Step 1: The user asks a question

        HTML example

        <!-- Textarea question input -->

        <textarea id="inputQuestion">How do I simplify text?</textarea>

Step 2: Use Semantic Search to give a score for how well each section of a document relates to the question

By adding "###" between sections of a document we can use the split function in Javascript to create an array of text blocks for Semantic Search to evaluate and score.

Semantic Search generates a score for each text block related to how much GPT-3 thinks that it relates to the query. The higher the score, the more closely Semantic Search thinks it relates to the query.

Because each score is only relative to the query and the text block, you can do multiple searches over several API calls and combine the results to find the highest score.

Semantic Search can accept up to 200 text blocks per API call. Each text block plus the query can be up to 2,000 tokens in length.

Javascript and jQuery


// This is the API request for Semantic Search


// This variable will hold the sections of the documentation.
// In this example the text in the textarea above will be imported and 
// divided into an array using the "###" tag as the split indicator.
var documents;
 
// This variable will hold the scores for each section returned by the Semantic Search API
var scores;


function apiScore(){

    // This splits the text in the textarea into a list of text blocks and removes surrounding white space
    documents = inputDocuments.value.split("###").map(Function.prototype.call, String.prototype.trim);
    
    // Clear the textarea
    inputDocuments.value = "";
    
    // The engine to use for Semantic Search
    var selectedEngine = "babbage"

    // The information we're sending to the API: An array of documents and the question as the query
    var _data = {
                    "documents": documents,
                    "query": inputQuestion.value
                };

    // The header for the request
    var _headers = {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
        'Authorization': `Bearer ${apiKeyInput.value}`
    };
    
    // Our jquery request
    $.ajax({
    type:'POST',
    url: `https://api.openai.com/v1/engines/${selectedEngine}/search`,
    dataType:'JSON',
    headers: _headers,
    data: JSON.stringify(_data),
        success:function(result){

            // Assigning the result data to the scores variable to be used later
            scores = result.data;
    
            // Displaying the results in the document textarea
            result.data.forEach(element => {
                var _doc = documents[element.document];
                var _score = "Document: " + element.document + "\nScore: " + element.score + "\n----------------\n" + _doc + "\n\n";
                inputDocuments.value += _score;
            });

        }
    });

    }

}

Step 3: Order the text blocks by score

The Semantic Search API returns an ordered list of scores. With a simple sorting function we can use that list to re-order our text blocks from highest to lowest score.

The higher the score, the more closely Semantic Search thinks it matches the query.

Javascript


// This function will sort through the returned list of scores and
// rank them from highest to lowest and display them in a div.

// This variable will hold our highest scoring document.
// You can also use an array to hold more than one high scoring document 
var highestScoredDocument = "";

function sortDocs(){

    // This is the div that we want to display the ranked sections
    rankedDocuments.innerHTML = "";

    // This function orders the scores array of document scores based on the "score" attribute of document
    scores.sort(function (x, y) {
        return y.score - x.score;
    });

    // This iterates through the scores and displays them in the rankedDocuments div
    scores.forEach((score)=>{

        // The scores array only stores the score and the index of each document.
        // To retrieve the actual document text we use the scores document 
        // index (score.document) to find the matching document. 
        var _doc =  documents[score.document]

        var _score = "Score: " + score.score + "\n----------------\n" + _doc + "\n";
        rankedDocuments.innerHTML += _score;

        // This stores the first score (the highest) and stores it to the highestScoredDocument variable
        if(highestScoredDocument == ""){
            highestScoredDocument = _doc;
        }
        
    });

        
}

Step 4: Insert the highest-scoring text block into a Q&A prompt along with the question

We insert the top result into a prompt that also has the user's question.

Javascript

             
// This function takes the text in the prompt textarea and inserts the highest 
// scoring section from the documenation (highestScoredDocument) and inserts 
// the question (inputQuestion) into the prompt.
//
// This creates a new custom prompt designed to answer the question based on
// the context of the top result from the documentation.

function insertIntoPrompt(){

    var prompt = promptText.value;
    prompt = prompt.replace("[Document]", highestScoredDocument);
    prompt = prompt.replace("[Question]", inputQuestion.value);
    promptText.innerHTML = prompt;
    
}

Step 5: Send the prompt to the API

Send the prompt with the question and the highest scoring text block to the API to generate an answer.

Javascript and jQuery

             
// This function sends the prompt with the embedded question and documentation section 
// to the OpenAI AI endpoint to generate an answer.
//
// The engine, temperature and other settings can all be customized to get the 
// best response. 

function apiCaller(){

    // Tell the user that the app is working.
    resultSection.innerHTML = "Working...";

    // Assign prompText to the context 
    var contextString =  promptText.value;

    // The engine we've selected to use for this task 
    var engine = "davinci"

    // The data we're sending to the API 
    var _data = {
        "context": contextString,
        "length": 120,
        "temperature": 0.3,
        "best_of": 1,
        "completions": 1,
        "stream": false,
        "logprobs": 0,
        "stop": "#####"
    };

    // The header for the jQuery call 
    var _headers = {
        'Content-Type': 'application/json',
        'Accept': 'application/json',
        'Authorization': `Bearer ${apiKeyInput.value}`
    }

    // The AJAX request 
    $.ajax({
    type:'POST',
    url: `https://api.openai.com/v1/engines/${engine}/generate`,
    dataType:'JSON',
    headers: _headers,
    data: JSON.stringify(_data),
        success:function(result){

            // Get the response from the result data
            var responseData = result.data[0].text.join("").replace(contextString, "").trim();
             
            // Clean up quotes from the prompt 
            var response = responseData.replace(`"""`, "")
            
            // Display the question and the generated answer
            resultSection.innerHTML =  "Q: " + inputQuestion.value.trim() + "

A: " + response;
        
        }
    });

}

Closing

This is a very simple example of how to use Semantic Search to answer a question. Feel free to add your own data and experiment.

The prompt example in this demonstration is extremely basic and could be made more advanced with additional examples.

Instead of documentation you can use information from Wikipedia, an interview with someone, a podcast transcript or any other large text document.

You're also free to modify and improve the code as you see fit.