Python & Rembg

Few days ago a close colleague asking me about the possibilities to execute his idea. It was a casual conversation. So I freely thrown a few possibilities, until I said the magic word: ML/AI. My experience with Machine learning and AI is as far as using Midjourney, ChatGPT, Stable Diffusion like a year ago or so. So I barely have the technical knowledge of AI. So I sleep it off. It was a casual conversation after all. Nothing to worry about, or so I thought.

I woke up the next morning feeling the urge to look up into AI in more detailed. The next thing I know that I was in VSCode typing up Python code. I thought, "Holy snake! what has gotten into me?" 🐍 So my journey into the Python jungle begin!

Image segmentation, U-2-Net & Rembg

As a starting point I pick up something that close to what I do almost all the time, a visual related thing: Removing the photo's background. In the past, I've tried a web/API-based service like remove.bg, just to see the result. Also in the recent years Photoshop has getting better in doing this kind of removing background or detecting foreground and background.

So I stumbled upon U-2-Net Github repository, and not long after that I dug up Rembg repository, then looking up Youtube videos and (as always) StackOverflowed a bunch of terms and snippets, and then I realized my browser tabs are getting smaller and smaller 😓

I cloned U-2-Net and Rembg into my computer and along the way, learn about the term "Image segmentation" and how it works by providing a bunch of datasets to train the algorithm. All while teaching myself to familiarize with basic Python's syntax. The venom is already in my vein at this point :)

Looking into different models

Rembg provide a various models to attach to it. Beside the U-2-Net model, there are isnet-general-use and Segmented Anything Model (SAM), among other. Again, I look into the model that can segmenting photo and knows which is foreground (the main object) and which is background (that needs to be removed).

U-2-Net

The first test using U-2-Net model that being set as a default model in Rembg. The result is moderately good until I pick the bike photo, because every photo retoucher that working manually with Photoshop knows how much the pain to cut out bike's spokes. While it's pretty good on human figure, I can see it having a hard time with narrow areas, especially the spokes 😂. Maybe its pre-trained datasets lack of bikes or objects with narrow area, I don't really know.

The result using U-2-Net model - Photo by Zoltan Tasi

Isnet-general-use

Unsatisfied with pre-trained U-2-Net model, I was looking for a way to switch the model within Rembg. Luckily there's a guide on Rembg's repo. So I switch it up with pre-trained isnet-general-use and the result is really promising! It easily detect narrow area, and although not really perfect but it definitely way better!

The result using isnet-general-use (DIS) model

Segmented Anything (SAM)

At the time of writing this, I'm still trying to figure out the setting for SAM which unlike other models, is not really straight forward. I notice we can do segmentation automatically or manually by providing bounding box points that refer to the area we want it to focus. But isn't it defeats the purpose of being 'machine learning' if we still need to dictate the area to focus on? 🤔

The render time for each image on Macbook Pro M1 ~2-3 seconds with the high-resolution image from Unsplash

What's next?

Making it more robust running on local and after that I'll explore FastAPI to expose these into easily accessible API. Either that or maybe i'll learn a bit on TKinter to make it standalone-installable app. So many things to learn, so little time!

Snippet

If you use Rembg as a library like I do (not as CLI), here's the snippet to change the model. Refer to this list to choose the model you want to apply, type exactly as written there. If you don't have the model on your local computer, Rembg will conveniently download it for us upon the script being executed for the first time.