Creating a Stable Diffusion App
Generative AI has become a hot topic in the past couple of years, reaching a point where anyone can use it with quite acceptable results. Like many others, I've had fun generating random images from time to time, whether trying out closed applications like DALL-E, Midjourney, NovelAI, or using their free versions like Stable Diffusion with interfaces such as Automatic1111's WebUI or ConfyUI.
The Motivation
However, I had two main concerns: understanding how these image-generating AIs worked and the lack of control I felt when using the aforementioned applications. The most common applications of this type only work with prompts, turning image generation into a bit of a lottery. In some cases, it works well, but in others, it's frustrating not to get what you want. Free versions (basically all those using models based on Stable Diffusion) tend to address these shortcomings, but the UI can sometimes be unsatisfactory or fail occasionally. For example, WebUI's inpaint feature sometimes fails when loading a new image with a mask already drawn.
The First Steps
So, I decided to create a simple application to address these concerns and use it as an excuse to learn Python and some of its graphical libraries.
After realizing the need to create this application, I thought about the first steps to take. The answer was the same as any other time I've needed to learn a new technology, library, or language: search for the official documentation of what I want to learn.
Understanding Stable Diffusion
The first step was to understand what Stable Diffusion is. Since I had already used it with third-party applications, I had some notion about it—an AI model developed by Stability AI that generates images from text and is fortunately free to use. It can be used with the Diffusers library using Python as the programming language.
Great, I had a starting point with a well-documented library on Hugging Face's website. I began my initial experiments with simple exercises in Python, initially without a user interface. After some trials, I decided to load the model from a single file, giving me behavior similar to applications like WebUI, where you can choose the model from a dropdown menu. Using the image generation model is quite straightforward, and I appreciate how well-organized the documentation is.
Choosing the User Interface
Next, I had to choose what to use for the user interface. At first, I considered using a web page as the main interface, as most popular interfaces do, and I could have easily done this with a library like FastAPI. However, I ultimately chose a graphical interface library for desktop applications to do something different from my professional work as a web developer.
I decided to use QT with QML. During the development, I initially used QT alone but later preferred QML to better organize the visual code with the business logic code, utilizing the MVC pattern more effectively.
I must say that by the time I made this decision, I was already quite advanced in developing the graphical interface. Fortunately, it wasn't too complex to make the switch, and it improved the code's readability.
First version with QT
Second version with QML
Implementing Key Features
As seen in the video, I implemented several planned features in this first version:
- Inpaint and Inpaint Sketch: These features are more flexible than WebUI and support pressure levels if using a drawing tablet, better than just selecting a fixed pen size.
- OpenPose: Can be placed directly on the image, giving me a better visualization of the final result and allowing modifications if necessary.
- Checkbox Modes: Activating/deactivating different modes using checkboxes, allowing me to work on the same image without sending it to another workspace or having it in different tabs.
- Right-Click Menus: Added the ability to add Loras using the submenu with a right-click and copied the feature to increase or decrease the weight of a tag using the Shift+Up or Shift+Down key combination.
Future Plans
I still want to add other features, such as saving prompts for Loras and selecting them with a submenu by right-clicking on the Lora name in the prompt box.
Conclusion
This post captures my journey of creating a Stable Diffusion application, and I hope it inspires others to explore and create their own AI projects. Next time, I plan to take more detailed notes to present a more comprehensive and structured post.