Jun 10, 2024 | By Duncan Gichimu |Chatbot| WebDevelopment

Building a Chatbot - Formatting the Response

Building a Chatbot - Formatting the Response

When developing a chatbot with OpenAI, you will face two primary decisions that significantly impact its functionality and user experience:

  • Streaming the Response: To Stream or Not to Stream
  • Choosing the Response Format
  • Let's break down each decision to help you create an effective and user-friendly chatbot.

    To Stream or Not to Stream

    Streaming responses from OpenAI can enhance the user experience significantly. Here’s why:

  • Faster Responses: Streaming allows users to see parts of the response as they are generated, rather than waiting for the entire response to be completed. This reduces wait times from around 10 seconds to just 1 second.
  • User Engagement: Users can start reading the response immediately, enhancing engagement and satisfaction. The rate at which OpenAI delivers the stream is often faster than a typical reading speed, ensuring users always have something to read.
  • Choosing the Response Format

    You can configure a system prompt to tell the OpenAI LLM what text format to use when answering a question.

    plaintext
    We have provided context information below.
    ---------------------
    Format the answer in markdown
    ---------------------
    Given this information, answer the question: {query_str}

    When configuring the response format, you have several options. Each format has its advantages depending on your needs:

    Plaintext

  • Simplicity: Plaintext is the easiest format to use. It requires no additional processing and renders exactly as received from the LLM.
  • Limitations: The main drawback is the lack of styling options. It’s plain text without any formatting.
  • JSON or YAML

    JSON and YAML are ideal when you need structured data, i.e. the responses need to conform to a specific model or format. If you don’t need your response structured, but simply want JSON or YAML, the a structured format is not advisable as you will need to parse it and render without knowing it’s structure before hand it and types before hand.

    System Prompt Example:

    plaintext
    We have provided context information below.
    ---------------------
    Format the answer in JSON using the following schema
    {
      "type": "object",
      "properties": {
        "title": {
          "type": "string",
          "description": "The title"
        },
        "summary": {
          "type": "string",
          "description": "A summary of the response"
        },
        "text": {
          "type": "array",
          "items": {
            "type": "string"
          },
          "description": "The response"
        }
    }
    Example 
    {
    	"title": "Title response",
    	"summary": "Summary of the response",
    	"text": [
    		"Paragraph1",
    		"Paragraph2..."
    	]
    }
    ---------------------
    Given this information, answer the question: {query_str}

    Limitations:

  • Malformed JSON or YAML Missing or incorrect brackets, braces, commas e.t.c. can cause parsing errors and you will need to handle when this happens.
  • Incorrect Schema The schema may expect a certain data type (e.g., an integer), but the response contains a different type (e.g., a string)
  • javascript
    {
      "user": {
        "id": "abc",  // Expecting an integer, received a string
        "name": "John Doe",
    }
  • Finding a good streaming JSON or YAML parser Streaming parsers need to handle data chunks efficiently and correctly.
  • Extra rendering logic Determining when enough of the response has been received to start rendering can be complex. Continuous rendering updates as more data streams in must be managed smoothly.
  • NDJSON

    Parsing of JSON can be simplified by working with NDJSON. NDJSON is a format where each line is a valid JSON object. Example NDJSON

    javascript
    {"step": 1, "description": "Ask a question"}
    {"step": 2, "description": "Conduct background research"}
    {"step": 3, "description": "Formulate a hypothesis"}
    {"step": 4, "description": "Conduct an experiment"}
    {"step": 5, "description": "Analyze the data"}
    {"step": 6, "description": "Draw a conclusion"}

    HTML

    HTML is a highly flexible format that allows for advanced styling. It does not require the same level of parsing as JSON or YAML as it can be directly rendered in browser. However, HTML needs to be sanitized to prevent security risks. https://github.com/cure53/DOMPurify is a good sanitization library.

    Limitations

  • Browser Compatibility: Ensure that any HTML produced is compatible with most browsers and that no JavaScript is executed.
  • Malformed HTML: Similar to JSON and YAML, HTML might be malformed. You will need to handle and correct malformed HTML to ensure it renders correctly.
  • Markdown

    Great for simple formatting, such as headers, links, and lists. For my use, I found markdown to be the best. I have yet to encounter any malformed markdown.

    Parsing & Rendering Markdown

    If you choose not to stream the response, parsing Markdown is straightforward. The library marked is extremely fast and efficient. It is popular and well-maintained, making it a reliable choice for converting Markdown to HTML. After parsing the Markdown, you can use client-side HTML purification libraries like DOMPurify to sanitize and render the HTML safely.

    Example: Parsing Markdown with marked

    Here is a basic example of how to use marked to parse Markdown and render it as HTML:

    typescript
    import marked from 'marked';
    import DOMPurify from 'dompurify';
    
    // Sample Markdown text
    const markdownText = `
    ## Hello World
    
    This is a **Markdown** example.
    `;
    
    // Parse Markdown to HTML
    const rawHTML = marked(markdownText);
    
    // Sanitize HTML
    const cleanHTML = DOMPurify.sanitize(rawHTML);
    
    // Render HTML (in a web context, you would insert this into the DOM)
    document.getElementById('content').innerHTML = cleanHTML;

    Streaming Markdown

    When streaming responses, you need to parse and render the stream as new data arrives. This requires a slightly different approach compared to handling the full response at once.

    Naive Solution: Reparsing and Re-rendering the Entire Markdown

    One simple but inefficient approach is to keep a running text of the streamed response. Every time new data is received, you reparse the entire Markdown and re-render it. Here’s how you can implement this:

    typescript
    import marked from 'marked';
    import DOMPurify from 'dompurify';
    
    let runningText = '';
    streamingResponse.on('data', data => {
        runningText += data;
        const rawHTML = marked(runningText);
        const cleanHTML = DOMPurify.sanitize(rawHTML);
        document.getElementById('content').innerHTML = cleanHTML;
    });

    This approach works well on powerful hardware but can be problematic on less capable devices due to the continuous reparsing and re-rendering process

    Efficient Parsing and Rendering: Reparsing Last “Paragraph”

    A more efficient method is to reparse and re-render only the last paragraph. This approach minimizes the workload compared to reparsing the entire text. We can accomplish this using the marked library. When marked parses markdown, it generates tokens. In marked, a token represents a discrete unit of markdown. For example, heading, link, code, list, list-item, strong.

    Let's take the following Markdown as an example:

    markdown
    1. Type in stuff on the left.
    2. See the live updates on the right.
    
    That's it.  Pretty simple.  There's also a drop-down option above to switch between various views:
    
    - **Preview:**  A live display of the generated HTML as it would render in a browser.
    - **HTML Source:**  The generated HTML before your browser makes it pretty.

    When parsed with marked, the output is a list of tokens:

    javascript
    [
      {
        type: "list",
        raw: "1. Type in stuff on the left.\n2. See the live updates on the right.",
        items: [
          {
            type: "list_item",
            raw: "1. Type in stuff on the left.\n",
          },
          {
            type: "list_item",
            raw: "2. See the live updates on the right.",
          }
        ]
      },
      {
        type: "space",
        raw: "\n\n"
      },
      {
        type: "paragraph",
        raw: "That's it.  Pretty simple.  There's also a drop-down option above to switch between various views:",
      },
      {
        type: "space",
        raw: "\n\n"
      },
      {
        type: "list",
        raw: "- **Preview:**  A live display of the generated HTML as it would render in a browser.\n- **HTML Source:**  The generated HTML before your browser makes it pretty.\n",
        items: [
          {
            type: "list_item",
            raw: "- **Preview:**  A live display of the generated HTML as it would render in a browser.\n",
          },
          {
            type: "list_item",
            raw: "- **HTML Source:**  The generated HTML before your browser makes it pretty.",
          }
        ]
      }
    ]

    Strategy for Efficient Streaming

    You can keep track of the last parsed token. When new data arrives, concatenate it with the raw text of the last token, then reparse. This will typically generate one or two tokens. If it generates one token, re-render that token. If it generates two tokens, re-render the first and update the last token to the second one.

    typescript
    import marked from 'marked';
    import DOMPurify from 'dompurify';
    
    // Example code
    function Parser() {
        this.lastToken = []
        this.resultHtml = []
        this.write = function(mdChunk: string) {
            // Parse previous token and next chunk
            const lastToken = this.tokens.pop();
            const text = lastToken.raw + mdChunk;
            const nextTokens = marked.lexer(text);
            this.tokens.push(...nextTokens);
    
            const nextHtml = nextTokens.map(tokens =>
                marked.parser([tokens], {
                    renderer: this.renderer
                })
            );
    
            this.resultHtml.pop();
            this.resultHtml.push(...nextHtml.map(html => DOMPurify.sanitize(html));
            // Render HTML
        }
    }
    
    const parser = new Parser();
    streamingResponse.on('data', data => {
        parser.write(data);
        // Render the HTML using whichever method you please
        this.render(parser.resultHtml)
    });

    The pitfall of this is extremely large tokens. A very large list or a very large paragraph could mean parsing and rendering the very large token. I have yet to encounter very large paragraphs from Open AI. However, I regularly see large lists or the entire response being a list. It is worthwhile improving the parsing algorithm to prevent re-parsing of list items.

    An even better solution:

    Never reparsing. This can be accomplished by fully implementing a streaming markdown parser yourself. Or finding an existing one. These do exist on NPM, but I have not found any that is widely used.

    Summary

    When building a chatbot with OpenAI, decisions need to be made about the response format and whether to stream the response. Streaming is recommended for a better user experience. Various formats can be used, such as plaintext, JSON, YAML, HTML, and Markdown. However, Markdown is the most recommended due to its structure and ease of parsing and rendering. When streaming - parsing and rendering can be optimized by reparsing and re-rendering only the last paragraph or by implementing a streaming markdown parser.