Duncan Gichimu

When developing a chatbot with OpenAI, you will face two primary decisions that significantly impact its functionality and user experience:

Streaming the Response: To Stream or Not to Stream

Choosing the Response Format

Let's break down each decision to help you create an effective and user-friendly chatbot.

To Stream or Not to Stream

Streaming responses from OpenAI can enhance the user experience significantly. Here’s why:

Faster Responses: Streaming allows users to see parts of the response as they are generated, rather than waiting for the entire response to be completed. This reduces wait times from around 10 seconds to just 1 second.

User Engagement: Users can start reading the response immediately, enhancing engagement and satisfaction. The rate at which OpenAI delivers the stream is often faster than a typical reading speed, ensuring users always have something to read.

Choosing the Response Format

You can configure a system prompt to tell the OpenAI LLM what text format to use when answering a question.

plaintext

We have provided context information below.
---------------------
Format the answer in markdown
---------------------
Given this information, answer the question: {query_str}

When configuring the response format, you have several options. Each format has its advantages depending on your needs:

Plaintext

Simplicity: Plaintext is the easiest format to use. It requires no additional processing and renders exactly as received from the LLM.

Limitations: The main drawback is the lack of styling options. It’s plain text without any formatting.

JSON or YAML

JSON and YAML are ideal when you need structured data, i.e. the responses need to conform to a specific model or format. If you don’t need your response structured, but simply want JSON or YAML, the a structured format is not advisable as you will need to parse it and render without knowing it’s structure before hand it and types before hand.

System Prompt Example:

plaintext

We have provided context information below.
---------------------
Format the answer in JSON using the following schema
{
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "The title"
    },
    "summary": {
      "type": "string",
      "description": "A summary of the response"
    },
    "text": {
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "The response"
    }
}
Example 
{
	"title": "Title response",
	"summary": "Summary of the response",
	"text": [
		"Paragraph1",
		"Paragraph2..."
	]
}
---------------------
Given this information, answer the question: {query_str}

Limitations:

Malformed JSON or YAML Missing or incorrect brackets, braces, commas e.t.c. can cause parsing errors and you will need to handle when this happens.

Incorrect Schema The schema may expect a certain data type (e.g., an integer), but the response contains a different type (e.g., a string)

javascript

{
  "user": {
    "id": "abc",  // Expecting an integer, received a string
    "name": "John Doe",
}

Finding a good streaming JSON or YAML parser Streaming parsers need to handle data chunks efficiently and correctly.

Extra rendering logic Determining when enough of the response has been received to start rendering can be complex. Continuous rendering updates as more data streams in must be managed smoothly.

NDJSON

Parsing of JSON can be simplified by working with NDJSON. NDJSON is a format where each line is a valid JSON object. Example NDJSON

javascript

{"step": 1, "description": "Ask a question"}
{"step": 2, "description": "Conduct background research"}
{"step": 3, "description": "Formulate a hypothesis"}
{"step": 4, "description": "Conduct an experiment"}
{"step": 5, "description": "Analyze the data"}
{"step": 6, "description": "Draw a conclusion"}

HTML

HTML is a highly flexible format that allows for advanced styling. It does not require the same level of parsing as JSON or YAML as it can be directly rendered in browser. However, HTML needs to be sanitized to prevent security risks. https://github.com/cure53/DOMPurify is a good sanitization library.

Limitations

Browser Compatibility: Ensure that any HTML produced is compatible with most browsers and that no JavaScript is executed.

Malformed HTML: Similar to JSON and YAML, HTML might be malformed. You will need to handle and correct malformed HTML to ensure it renders correctly.

Markdown

Great for simple formatting, such as headers, links, and lists. For my use, I found markdown to be the best. I have yet to encounter any malformed markdown.

Parsing & Rendering Markdown

If you choose not to stream the response, parsing Markdown is straightforward. The library marked is extremely fast and efficient. It is popular and well-maintained, making it a reliable choice for converting Markdown to HTML. After parsing the Markdown, you can use client-side HTML purification libraries like DOMPurify to sanitize and render the HTML safely.

Example: Parsing Markdown with marked

Here is a basic example of how to use marked to parse Markdown and render it as HTML:

typescript

import marked from 'marked';
import DOMPurify from 'dompurify';

// Sample Markdown text
const markdownText = `
## Hello World

This is a **Markdown** example.
`;

// Parse Markdown to HTML
const rawHTML = marked(markdownText);

// Sanitize HTML
const cleanHTML = DOMPurify.sanitize(rawHTML);

// Render HTML (in a web context, you would insert this into the DOM)
document.getElementById('content').innerHTML = cleanHTML;

Streaming Markdown

When streaming responses, you need to parse and render the stream as new data arrives. This requires a slightly different approach compared to handling the full response at once.

Naive Solution: Reparsing and Re-rendering the Entire Markdown

One simple but inefficient approach is to keep a running text of the streamed response. Every time new data is received, you reparse the entire Markdown and re-render it. Here’s how you can implement this:

typescript

import marked from 'marked';
import DOMPurify from 'dompurify';

let runningText = '';
streamingResponse.on('data', data => {
    runningText += data;
    const rawHTML = marked(runningText);
    const cleanHTML = DOMPurify.sanitize(rawHTML);
    document.getElementById('content').innerHTML = cleanHTML;
});

This approach works well on powerful hardware but can be problematic on less capable devices due to the continuous reparsing and re-rendering process

Efficient Parsing and Rendering: Reparsing Last “Paragraph”

A more efficient method is to reparse and re-render only the last paragraph. This approach minimizes the workload compared to reparsing the entire text. We can accomplish this using the marked library. When marked parses markdown, it generates tokens. In marked, a token represents a discrete unit of markdown. For example, heading, link, code, list, list-item, strong.

Let's take the following Markdown as an example:

markdown

1. Type in stuff on the left.
2. See the live updates on the right.

That's it.  Pretty simple.  There's also a drop-down option above to switch between various views:

- **Preview:**  A live display of the generated HTML as it would render in a browser.
- **HTML Source:**  The generated HTML before your browser makes it pretty.

When parsed with marked, the output is a list of tokens:

javascript

[
  {
    type: "list",
    raw: "1. Type in stuff on the left.\n2. See the live updates on the right.",
    items: [
      {
        type: "list_item",
        raw: "1. Type in stuff on the left.\n",
      },
      {
        type: "list_item",
        raw: "2. See the live updates on the right.",
      }
    ]
  },
  {
    type: "space",
    raw: "\n\n"
  },
  {
    type: "paragraph",
    raw: "That's it.  Pretty simple.  There's also a drop-down option above to switch between various views:",
  },
  {
    type: "space",
    raw: "\n\n"
  },
  {
    type: "list",
    raw: "- **Preview:**  A live display of the generated HTML as it would render in a browser.\n- **HTML Source:**  The generated HTML before your browser makes it pretty.\n",
    items: [
      {
        type: "list_item",
        raw: "- **Preview:**  A live display of the generated HTML as it would render in a browser.\n",
      },
      {
        type: "list_item",
        raw: "- **HTML Source:**  The generated HTML before your browser makes it pretty.",
      }
    ]
  }
]

Strategy for Efficient Streaming

You can keep track of the last parsed token. When new data arrives, concatenate it with the raw text of the last token, then reparse. This will typically generate one or two tokens. If it generates one token, re-render that token. If it generates two tokens, re-render the first and update the last token to the second one.

typescript

import marked from 'marked';
import DOMPurify from 'dompurify';

// Example code
function Parser() {
    this.lastToken = []
    this.resultHtml = []
    this.write = function(mdChunk: string) {
        // Parse previous token and next chunk
        const lastToken = this.tokens.pop();
        const text = lastToken.raw + mdChunk;
        const nextTokens = marked.lexer(text);
        this.tokens.push(...nextTokens);

        const nextHtml = nextTokens.map(tokens =>
            marked.parser([tokens], {
                renderer: this.renderer
            })
        );

        this.resultHtml.pop();
        this.resultHtml.push(...nextHtml.map(html => DOMPurify.sanitize(html));
        // Render HTML
    }
}

const parser = new Parser();
streamingResponse.on('data', data => {
    parser.write(data);
    // Render the HTML using whichever method you please
    this.render(parser.resultHtml)
});

The pitfall of this is extremely large tokens. A very large list or a very large paragraph could mean parsing and rendering the very large token. I have yet to encounter very large paragraphs from Open AI. However, I regularly see large lists or the entire response being a list. It is worthwhile improving the parsing algorithm to prevent re-parsing of list items.

An even better solution:

Never reparsing. This can be accomplished by fully implementing a streaming markdown parser yourself. Or finding an existing one. These do exist on NPM, but I have not found any that is widely used.

Summary

When building a chatbot with OpenAI, decisions need to be made about the response format and whether to stream the response. Streaming is recommended for a better user experience. Various formats can be used, such as plaintext, JSON, YAML, HTML, and Markdown. However, Markdown is the most recommended due to its structure and ease of parsing and rendering. When streaming - parsing and rendering can be optimized by reparsing and re-rendering only the last paragraph or by implementing a streaming markdown parser.

Building a Chatbot - Formatting the Response

To Stream or Not to Stream

Choosing the Response Format

Plaintext

JSON or YAML

NDJSON

HTML

Markdown

Parsing & Rendering Markdown

Example: Parsing Markdown with marked

Streaming Markdown

Naive Solution: Reparsing and Re-rendering the Entire Markdown

Efficient Parsing and Rendering: Reparsing Last “Paragraph”

Strategy for Efficient Streaming

Summary