Building a Chatbot - Formatting the Response
When developing a chatbot with OpenAI, you will face two primary decisions that significantly impact its functionality and user experience:
Let's break down each decision to help you create an effective and user-friendly chatbot.
To Stream or Not to Stream
Streaming responses from OpenAI can enhance the user experience significantly. Here’s why:
Choosing the Response Format
You can configure a system prompt to tell the OpenAI LLM what text format to use when answering a question.
We have provided context information below.
---------------------
Format the answer in markdown
---------------------
Given this information, answer the question: {query_str}
When configuring the response format, you have several options. Each format has its advantages depending on your needs:
Plaintext
JSON or YAML
JSON and YAML are ideal when you need structured data, i.e. the responses need to conform to a specific model or format. If you don’t need your response structured, but simply want JSON or YAML, the a structured format is not advisable as you will need to parse it and render without knowing it’s structure before hand it and types before hand.
System Prompt Example:
We have provided context information below.
---------------------
Format the answer in JSON using the following schema
{
"type": "object",
"properties": {
"title": {
"type": "string",
"description": "The title"
},
"summary": {
"type": "string",
"description": "A summary of the response"
},
"text": {
"type": "array",
"items": {
"type": "string"
},
"description": "The response"
}
}
Example
{
"title": "Title response",
"summary": "Summary of the response",
"text": [
"Paragraph1",
"Paragraph2..."
]
}
---------------------
Given this information, answer the question: {query_str}
Limitations:
{
"user": {
"id": "abc", // Expecting an integer, received a string
"name": "John Doe",
}
NDJSON
Parsing of JSON can be simplified by working with NDJSON. NDJSON is a format where each line is a valid JSON object. Example NDJSON
{"step": 1, "description": "Ask a question"}
{"step": 2, "description": "Conduct background research"}
{"step": 3, "description": "Formulate a hypothesis"}
{"step": 4, "description": "Conduct an experiment"}
{"step": 5, "description": "Analyze the data"}
{"step": 6, "description": "Draw a conclusion"}
HTML
HTML is a highly flexible format that allows for advanced styling. It does not require the same level of parsing as JSON or YAML as it can be directly rendered in browser. However, HTML needs to be sanitized to prevent security risks. https://github.com/cure53/DOMPurify is a good sanitization library.
Limitations
Markdown
Great for simple formatting, such as headers, links, and lists. For my use, I found markdown to be the best. I have yet to encounter any malformed markdown.
Parsing & Rendering Markdown
If you choose not to stream the response, parsing Markdown is straightforward. The library marked is extremely fast and efficient. It is popular and well-maintained, making it a reliable choice for converting Markdown to HTML. After parsing the Markdown, you can use client-side HTML purification libraries like DOMPurify to sanitize and render the HTML safely.
Example: Parsing Markdown with marked
Here is a basic example of how to use marked to parse Markdown and render it as HTML:
import marked from 'marked';
import DOMPurify from 'dompurify';
// Sample Markdown text
const markdownText = `
## Hello World
This is a **Markdown** example.
`;
// Parse Markdown to HTML
const rawHTML = marked(markdownText);
// Sanitize HTML
const cleanHTML = DOMPurify.sanitize(rawHTML);
// Render HTML (in a web context, you would insert this into the DOM)
document.getElementById('content').innerHTML = cleanHTML;
Streaming Markdown
When streaming responses, you need to parse and render the stream as new data arrives. This requires a slightly different approach compared to handling the full response at once.
Naive Solution: Reparsing and Re-rendering the Entire Markdown
One simple but inefficient approach is to keep a running text of the streamed response. Every time new data is received, you reparse the entire Markdown and re-render it. Here’s how you can implement this:
import marked from 'marked';
import DOMPurify from 'dompurify';
let runningText = '';
streamingResponse.on('data', data => {
runningText += data;
const rawHTML = marked(runningText);
const cleanHTML = DOMPurify.sanitize(rawHTML);
document.getElementById('content').innerHTML = cleanHTML;
});
This approach works well on powerful hardware but can be problematic on less capable devices due to the continuous reparsing and re-rendering process
Efficient Parsing and Rendering: Reparsing Last “Paragraph”
A more efficient method is to reparse and re-render only the last paragraph. This approach minimizes the workload compared to reparsing the entire text. We can accomplish this using the marked library. When marked parses markdown, it generates tokens. In marked, a token represents a discrete unit of markdown. For example, heading, link, code, list, list-item, strong.
Let's take the following Markdown as an example:
1. Type in stuff on the left.
2. See the live updates on the right.
That's it. Pretty simple. There's also a drop-down option above to switch between various views:
- **Preview:** A live display of the generated HTML as it would render in a browser.
- **HTML Source:** The generated HTML before your browser makes it pretty.
When parsed with marked, the output is a list of tokens:
[
{
type: "list",
raw: "1. Type in stuff on the left.\n2. See the live updates on the right.",
items: [
{
type: "list_item",
raw: "1. Type in stuff on the left.\n",
},
{
type: "list_item",
raw: "2. See the live updates on the right.",
}
]
},
{
type: "space",
raw: "\n\n"
},
{
type: "paragraph",
raw: "That's it. Pretty simple. There's also a drop-down option above to switch between various views:",
},
{
type: "space",
raw: "\n\n"
},
{
type: "list",
raw: "- **Preview:** A live display of the generated HTML as it would render in a browser.\n- **HTML Source:** The generated HTML before your browser makes it pretty.\n",
items: [
{
type: "list_item",
raw: "- **Preview:** A live display of the generated HTML as it would render in a browser.\n",
},
{
type: "list_item",
raw: "- **HTML Source:** The generated HTML before your browser makes it pretty.",
}
]
}
]
Strategy for Efficient Streaming
You can keep track of the last parsed token. When new data arrives, concatenate it with the raw text of the last token, then reparse. This will typically generate one or two tokens. If it generates one token, re-render that token. If it generates two tokens, re-render the first and update the last token to the second one.
import marked from 'marked';
import DOMPurify from 'dompurify';
// Example code
function Parser() {
this.lastToken = []
this.resultHtml = []
this.write = function(mdChunk: string) {
// Parse previous token and next chunk
const lastToken = this.tokens.pop();
const text = lastToken.raw + mdChunk;
const nextTokens = marked.lexer(text);
this.tokens.push(...nextTokens);
const nextHtml = nextTokens.map(tokens =>
marked.parser([tokens], {
renderer: this.renderer
})
);
this.resultHtml.pop();
this.resultHtml.push(...nextHtml.map(html => DOMPurify.sanitize(html));
// Render HTML
}
}
const parser = new Parser();
streamingResponse.on('data', data => {
parser.write(data);
// Render the HTML using whichever method you please
this.render(parser.resultHtml)
});
The pitfall of this is extremely large tokens. A very large list or a very large paragraph could mean parsing and rendering the very large token. I have yet to encounter very large paragraphs from Open AI. However, I regularly see large lists or the entire response being a list. It is worthwhile improving the parsing algorithm to prevent re-parsing of list items.
An even better solution:
Never reparsing. This can be accomplished by fully implementing a streaming markdown parser yourself. Or finding an existing one. These do exist on NPM, but I have not found any that is widely used.
Summary
When building a chatbot with OpenAI, decisions need to be made about the response format and whether to stream the response. Streaming is recommended for a better user experience. Various formats can be used, such as plaintext, JSON, YAML, HTML, and Markdown. However, Markdown is the most recommended due to its structure and ease of parsing and rendering. When streaming - parsing and rendering can be optimized by reparsing and re-rendering only the last paragraph or by implementing a streaming markdown parser.