- Members 50037
- Blogs 33

Okay, you were supposed to get this tutorial before you got the previous one I unintentionally sent on Friday (instead of Tuesday) about Einstein. Doh!
Anyway, as you know, openAI has integrated advanced image generation capabilities into ChatGPT through the GPT-4o model, allowing users to create and refine images directly within the chat interface. Here's a concise guide on how to use this feature in case the Friday video left some things unsaid:
- Access ChatGPT: Log in to your ChatGPT account. This image generation feature is available to users across various subscription tiers, including Free (well, eventually, if not when you read this), Plus, Pro, and Team.
- Input Your Prompt: In the chat interface for ChatGPT4o, describe the image you wish to generate in detail. For example: "Create an infographic explaining the water cycle."
- Generate the Image: After submitting your prompt, ChatGPT will process your request and generate the image. This may take a few moments, as the system ensures high-quality output.
- Refine the Image: If you wish to make adjustments, provide follow-up prompts specifying the changes. For instance: "Add labels to each stage of the water cycle in the infographic."
Download and Use: Once satisfied with the image, you can download it for your use. According to OpenAI, users own the images generated and are free to use them within the bounds of their usage policies.
Here's an example of a complex prompt I copied from the OpenAI website to create the image shown above. It's amazing how it gets most of the details right.
Please create an image using the following: A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a tshirt wiith a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer's reflection. The text reads: (left) "Transfer between Modalities: Suppose we directly model p(text, pixels, sound) [equation] with one big autoregressive transformer.
Pros: * image generation augmented with vast world knowledge * next-level text rendering * native in-context learning * unified post-training stack Cons: * varying bit-rate across modalities * compute not adaptive" (Right) "Fixes: * model compressed representations * compose autoregressive prior with a powerful decoder" On the bottom right of the board, she draws a diagram: "tokens -> [transformer] -> [diffusion] -> pixels"