Post

Vision Image Gen Ui

Vision Image Gen Ui

AI Image Generator

This project leverages OpenAI’s GPT Vision and DALL-E models to analyze images and generate new ones based on user modifications. It provides two interfaces: a web UI built with Streamlit for interactive use and a command-line interface (CLI) for direct script execution. Features

  • Image Analysis: Automatically describes images using GPT-4 Vision.
  • Image Generation: Generates modified images based on user inputs using DALL-E 3.
  • Web Interface: Interactive web UI for easy operation.
  • CLI: Command-line version for script or batch processing.

Vision Image Gen

How it Works:

  • The app first downloads the image from the provided URL or path locally and analyzes it using the pre-trained AI model gpt-4-vision-preview to generate a description.
  • You’re then given the opportunity to modify this description to guide the image generation process, the original description from the vision model and your included description are used.
  • Finally, the app uses DALL-E 3 to generate a new image 1790x1024 based on the modified description.
  • You can see the original image and then newly created image. Right click to save.

đź“ş Watch Video

Installation

Tested in Python 3.11.4

Clone the repository to your local machine:

1
2
git clone https://github.com/bigsk1/vision-image-gen.git
cd vision-image-gen

Install the required dependencies:

pip install -r requirements.txt

Usage

To use the Web UI

To start the web interface, run:

1
streamlit run vision_image_gen_ui_local.py

Navigate to the URL provided by Streamlit, http://localhost:8501, in your web browser. Enter you Open AI API Key or Have your Open Ai Api key added to your system enviroment variables in PATH

  • Upload an Image: Use the provided input to upload an image or specify an image URL.
  • View Analysis: See the AI-generated description of the image.
  • Modify and Generate: Enter modifications to the original description and generate a new image.
  • View and Save: The generated image will be displayed, and you can save it locally.

CLI Version

The CLI version allows you to process images directly from your terminal.

1
python vision_image_gen.py

Using Streamlit Cloud Sharing

Use the vision_image_gen_ui.py for Streamlit Cloud sharing, in the settings just add

1
2
[openai]
api_key = "sk-paste-your-api-key"

the ui and ui_local are basiclly the same file but the api functions differently due to streamlit’s setup

Example of output

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
==================================================
Vision Response:
==================================================
The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of "#" and "." characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details

==================================================
User's Modification Input:
==================================================
make it in the style of an american flag

==================================================
Final Prompt Sent to DALL-E 3:
==================================================
The image shows a computer terminal interface with ASCII art and text. At the top would be ASCII art resembling a face with a pattern of characters. Below it, within a minimalist window frame, is a navigation menu with options depicted as a pixel-style globe icon labeled "sumfetch," a document icon labeled "ABOUT," a link icon labeled "Website," a folder icon labeled "This Repo," and a series of contact methods including an email address, GitHub URL, and Twitter handle, all associated with the username "bigsk1". The central feature is a bold ASCII art logo or emblem saying "BIGSK1" inside a stylized circular border.

For a text-to-image model, you could describe it as follows:

"Create an image of a dark computer terminal screen with a pixelated face made out of ASCII characters at the top. Include a stylized ASCII art logo that says 'BIGSK1' in the center, enclosed in a circular patterned border. Below the logo, depict a simple user interface with text and monochrome icons signifying navigation options, including a globe for 'sumfetch,' a document for 'ABOUT,' a link chain for 'Website,' and a folder for 'This Repo.' Add additional details make it in the style of an american flag

Example image in the original_image folder this is were your downloaded images will end up.

The generated_images folder is were the new Dalle generated image will end up.

This is a work in progress, more to add soon.

This post is licensed under CC BY 4.0 by the author.