MultiModal Tree of Thoughts

Multi Modal tree of thoughts that leverages the GPT-4 language model and the Stable Diffusion model to generate a multimodal output and evaluate the output based a metric from 0.0 to 1.0 and then run a search algorithm using DFS and BFS and return the best output.

task: Generate an image of a swarm of bees -> Image generator -> GPT4V evaluates the img from 0.0 to 1.0 -> DFS/BFS -> return the best output

GPT4Vision will evaluate the image from 0.0 to 1.0 based on how likely it accomplishes the task
DFS/BFS will search for the best output based on the evaluation from GPT4Vision
The output will be a multimodal output that is a combination of the image and the text
The output will be evaluated by GPT4Vision
The prompt to the image generator will be optimized from the output of GPT4Vision and the search

Usage

streamlit run app.py

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github		.github
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agorabanner.png		agorabanner.png
app.py		app.py
errors.txt		errors.txt
example.py		example.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

MultiModal Tree of Thoughts

Usage

License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

kyegomez/MultiModal-ToT

Folders and files

Latest commit

History

Repository files navigation

MultiModal Tree of Thoughts

Usage

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Packages