ChatGPT API for Building Web Apps

ChatGPT and DALL-E with only a couple of API calls

May 27 2023

Get ready to supercharge your web applications using the ChatGPT API! In this article, we'll explore a fun example of generating random YouTube video ideas and show you how to bring them to life with ChatGPT and DALL-E. You'll learn how to unleash your creativity and create dynamic, user-friendly experiences. Let's dive in and discover the exciting possibilities!

Final Result

In this simple YouTube idea generator, the user can choose the attributes of what the video should be about including topics, adjectives and the viewer's age. All of this is then provided to the ChatGPT and DALL-E API's to generate a thumbnail, video name, channel name and video description.

By the end of this article, you'll be able to build this app and more generally understand how to use AI to power applications. Let's begin with some background on a few of the terms that will be used.

Terms Introduction

What is ChatGPT (text-davinci-003)?

ChatGPT (text-davinci-003) is a cutting-edge language model developed by OpenAI. It is designed to provide advanced conversational capabilities and generate human-like text responses. Built on the GPT-3.5 architecture, ChatGPT offers a powerful tool for various natural language processing tasks, including chatbots, virtual assistants, and interactive dialogue systems.

What is DALL-E?

DALL-E is a groundbreaking AI model developed by OpenAI that combines the power of generative adversarial networks (GANs) and transformers. It has the unique ability to generate highly realistic images from textual descriptions, allowing for creative and imaginative outputs. DALL-E opens up new frontiers in visual content generation and storytelling.

What is Prompt Engineering?

Prompt engineering, in simple terms, refers to the process of crafting effective instructions or queries to get desired responses from AI models. It involves understanding how to formulate prompts, questions, or inputs to elicit accurate and relevant outputs. By tweaking the wording and structure, prompt engineering helps improve the quality and specificity of AI-generated responses.

With the relevant background information out of the way, let's move on to the first step in building an AI-Powered application, the prompt.

Creating an Effective ChatGPT Prompt

Naive Attempt

Getting the result you want from a text generation model that can be used in your app isn't as simple as just asking for it. Since the thing consuming the response isn't a human, the response needs to follow a strict format, while by default the model will vary the response depending on the temperature parameter.

For example, consider the prompt below:

Generate a random YouTube video idea with a video title, channel name and video description.

Possible results:

Video Title: Deliciously Healthy Desserts for Summer

Channel Name: Sweet & Spicy Recipes

Video Description: Beat the heat this summer with these delicious and healthy dessert...

Title: Surfacing the Sublime: Journey Through Underwater Caves

Channel Name: DiveDeep

Video Description: This diving adventure will take us to breathtakingly beautiful desti...

Video Title-"Putting Scotchweed® Worm Food Pictures on JPEGify!" 

Channel Name - GardMania 

Video Description- Join expert Abbey Martinez as she walks you through how to put Scotchweed® Worm...

As we can see, while a human can easily understand the responses, without building a complicated and dynamic parser, it would be hard to use these responses in an application as minor structural changes can appear randomly throughout the response.

Describing Response Structure

As we saw in the last section, just asking for what we want without details results in unusable responses. To overcome this we can attempt to be more specific about what we want in response. Let's see what that could look like.

Improved prompt with description:

Generate a random YouTube video idea that follows this JSON structure:
  videoTitle: string (4-12 words)
  channelName: string (4-12 words)
  videoDescription: string (50-75 words)

Possible results:

 { 
          videoTitle: "CC Dreams' Bohemian", 
    channelName: "Catobase Creative", 
    videoDescription: "Introducing a never-before compared wardrobe...". 
  }

Video=(Title: Candy Recipe Tuesday
Channel: CookwareFunology
Video_Description: It's time to Mix-n-match with exclusively CookwareA...)

Video Title: Surgery Skill Tests 
Channel Name: Distance Learning Nurses 
Video Description: Expert classrooms spread across our school host ende...

Interesting. So even with the desired field names and return structure (JSON) specified, ChatGPT fails to ever respond with valid JSON consistently. In fact, the model appears to be behaving worse than it was with the naive prompt. This brings us to a key insight.

ChatGPT is trained for text-completion.

This means that for the model to perform how we want it to, we need to shape our prompt so that it can be completed. Let's apply this to our example.

Providing Examples in Prompt

From what we just saw above, we know that just describing what we want is insufficient. Because of the way ChatGPT was trained, examples are often better than descriptions.

Improved prompt with examples:

Random YouTube videos from different channels

Response:
{"videoTitle":"Ditch Your Favorite Programming Paradigm | Prime Reacts", 
"videoDescription":"Hey there, fellow coders! Get ready to shake th...", 
"channelName":"ThePrimeTime"}

Response:
{"videoTitle":"BEGINNERS guide to SPEEDRUNNING MINECRAFT",
"videoDescription":"In this tutorial will show you how to SpeedRun...", 
"channelName":"LiquidCandy"}

Last Response:

Possible results:

{"videoTitle":"We Played Join the Experience! - Let's Play - Sniper Elite...",
"videoDescription":"We settle in for a bunch of thick action co-op in ...", 
"channelName":"Sentii"}

{"videoTitle":"Writing A CAPTCHA Solver Using TensorFlow 2.0 | Model Creation...", 
"videoDescription":"In this final episode of the series, we are looking...", 
"channelName":"Capital Coder"}

{"videoTitle":"Why the Consumer Goods Industry Needs Blockchain Governance",
"videoDescription":"This week, Bitcoin is shining the spotlight on the need...", 
"channelName":"Chain Reaction"}

Finally, with examples we are consistently getting JSON responses with the fields we expect. As you might have noticed, there are a couple of subtle points that help guide ChatGPT to give us desirable results.

No questions: the input is formulated to be completed, matching what the model was trained to do.
"different channels": often when examples are added the AI will attempt to reuse your input in its output.
"Last Response": if "last" was ommitted, the model would continue to generate responses and would break parsing.

Now with the AI consistently providing usable responses we can add user input to the prompt to create a customizable experience.

Adding User Input to Prompt

For our example, the user input will be chosen from a finite set (topics, adjectives, age demographic), but our approach generalizes to more free-form input. Since we've seen how well text-davinci performs with examples, we will be adding fake user input to our examples to show the model how it should react to input. Then, for the "Last Response", we will include the actual user's input in the same format as the examples.

Prompt with user input:

`Random YouTube videos from different channels

  Topics: Technology, Review
  Adjectives: Exciting, Informative
  Age Demographic: 18-28
  Response:
  {"videoTitle":"Ditch Your Favorite Programming Paradigm | Prime Reacts", 
  "videoDescription":"Hey there, fellow coders! Get ready to shake thi...",
  "channelName":"ThePrimeTime"}

  Topics: Gaming, How-To
  Adjectives: Exciting, Informative
  Age Demographic: 4-10
  Response:
  {"videoTitle":"BEGINNERS guide to SPEEDRUNNING MINECRAFT", 
  "videoDescription":"In this tutorial will show you how to SpeedRun Minecraft...",
  "channelName":"LiquidCandy"}

  Topics: ${topics.join(", ")}
  Adjectives: ${adjectives.join(", ")}
  Age Demographic: ${demographic}
  Last Response:
  `

Possible results (including input for context):

Topics: Review, Music
Adjectives: Informative, Serious
Age Demographic: 28-40
Last Response:
{"videoTitle":"Resident Bump Pitch 8 Full Analysis", 
"videoDescription":"On today's episode, we're diving deep into Resident Bump's new...",
"channelName":"ThatRoccoGuy"}

Topics: Music, How-To
Adjectives: Informative, Exciting
Age Demographic: 18-28
Last Response:
{"videoTitle":"How To Make Digital Music From Scratch Without ANY Experience | Ableton Tutorial",
"videoDescription":"Learn how to make beats and full tracks from scratch without any prior...",
"channelName":"Musicrazor Beat Tutorials"}

Topics: Education, Business
Adjectives: Serious
Age Demographic: 72+
Last Response:
{"videoTitle":"The Business Survival Guide — A 21 day Bible",
"videoDescription":"Develop strategies for preparing your business for long-term success...",
"channelName":"BenSheald"}

With the text portion of our AI response completed, we can move on to creating a prompt to provide DALL-E for image generation.

Creating an Effective DALL-E Prompt

Incorporating Generated Video Title

Since image generation API's don't allow for image input, providing examples is impossible. Instead we will rely on a simple prompt that includes our video title? Since its already randomly generated and relates to the video, we can benefit from our previous work. For projects that don't already have a suitable field generated, you could add another field to the response called "imageDescription" and include examples in the input.

Image prompt incorporating the video title:

`Youtube Thumbnail for video "${returnObj.videoTitle}"`

A few image examples:

Youtube Thumbnail for video "Brand New Updated Studio Setups in 2021"

Youtube Thumbnail for video "BUSINESS Lessons from Millionaire Entrepreneurs"

Youtube Thumbnail for video "I Tried VR | Virtual Reality Gaming"

How to Call OpenAI API in Node.js

This section assumes you have an OpenAI API key

Calling the OpenAI API is simple within Node.js via the OpenAI npm package.

Here's a simplified version of the code needed to call both the ChatGPT and DALL-E APIs for the YouTube Idea Generator.

youtube.ts

import { Configuration, OpenAIApi } from "openai";

export type OpenAITextResponse = {
  videoTitle: string;
  videoDescription: string;
  channelName: string;
};

export type YouTubeIdea = OpenAITextResponse & {
  videoThumbnailURL: string;
};

const getAPIResponse = (prompt: string, apiKey: string) => {
  const configuration = new Configuration({ apiKey });
  const openai = new OpenAIApi(configuration);

  const textResponse = await openai.createCompletion({
    model: "text-davinci-003",
    prompt: prompt,
    max_tokens: 300,
    temperature: 1.2,
  });

  const returnText = textResponse.data.choices.at(0)?.text;
  const returnObj = JSON.parse(returnText) as OpenAITextResponse;

  const imageResponse = await openai.createImage({
    prompt: `Youtube Thumbnail for video "${returnObj.videoTitle}"`,
    size: "256x256",
  });
  const returnURL = imageResponse.data.data.at(0)?.url;

  return { ...returnObj, videoThumbnailURL: returnURL } as YouTubeIdea;
}

As can be seen above, once we have our text and image prompts created, actually requesting data from the API is remarkably easy. Since our prompt specifies that the return is JSON with specific fields, we can create a type definition that can be used on our frontend.

For more information on the endpoint parameters, checkout the OpenAI API documentation.

Conclusion

In this article we went over how to incorporate popular AI models into web applications. We covered the topic of prompt engineering for both text and image APIs and ended with a short example of calling the APIs. For text prompt building we went through multiple stages of refinement and learned how to craft a prompt that results in usable JSON responses. I hope that you found this article useful and have a better idea of how to build your own AI-powered application.

Full Source Code on GitHub.