Stable Diffusion & Project Crying-Mom

There has been this image of mom's expression that was burnt into my head which I wanted to capture eventually. It was a hauntingly unique facial reaction when she was in a nursing home with suffering from severe dementia.

It's the kind of expression that no combination of words would be able to capture. Image is the only possible medium, and I'm the only one who had seen it. I would rather not have it die with me.

I started having this intention by around year 2021. I can't draw to safe my life, so I took no action on it. Some time later I got in touch with a sketch artist specializing in faces but that didn't go very far.

As with many things, wait long enough and the problems will solve itself. Stable Diffusion (SD) came along and possibilities opened up.

At first without knowing it well, I thought it's only good for generating entirely new images (just like Midjourney and DALL-E). Only later did I learn about the feature called inpainting: take a photo, draw a region, describe what you want and have that region changed.

Being a noob in machine-learning, I started with running SD on a nine year-old gaming PC. It worked but each attempt took 5 minutes to run. That's unrealistic for anything serious.

I figured since SD is open source, there has to be many instances of it that other people run in the wild that I can use instead of running it myself.

Which led me to run into Hugging Face. This company is poised to be the Github of machine learning. I still don't have a full grasp of what they are capable of but it will be one platform you can't ignore as a technologist going forward.

From recommendations off of Reddit, I compiled a a list of SD instances with different features, most of them running on top of Hugging Face. I landed on one that focuses on inpainting.

I got started sculpting mom's face with a photo of her grumpy face (probably taken by accident by whoever was holding the camera).

Out of ignorance, I thought I could inpaint the entire face, prompt it with "scared, alarmed, mouth wide open, hair shaven, old skinny" and be done with it. The outcome was downright comedic, an entirely new face showed up.

Only later did I realize I had to do this like how one works Photoshop. I would inpaint just the face and neck, prompt it with "old, skinny, wrinkly", run it a few times and the results came out incredibly accurate.

I continued doing this to eyebrows, the shirt, one part at a time.

Prompting is basically wizardry. It's a process of discovering what the machine learning model knows and doesn't.

Trying to replace the background for one was difficult. I wanted something plain, blur and patternless. What it kept giving me was something noisy and interesting. I still haven't crack this one.

The toughest parts of the face to replace are the eyes and the mouth. These are the two defining characteristics mom's expressions, precision matters more than face and hair.

The effort continues.