Chroma Key using image segmentation and Deeplab v3

Written by Richard

Image segmentation using Deeplab

We had a request to take portraits of people and apply them to one of several magazine cover images. Normally this would require taking the picture of the person in front of a blue or green screen. This can work pretty well so long as the person is not wearing something close to the background color used. Use of a chroma keying operation in OpenCV is well demonstrated at the Github project blue-screen-effect-OpenCV. If you look at the project and try it yourself, you can see it is not perfect.

The other way would be to make a magazine cover with transparency and combine it with an image of a person. This is in fact what we ended up doing with OpenCV.

What I am about to describe is also not perfect. Far from it. But I am getting ahead of myself.

I started with a Github project realtime_bg_blurring. It is relatively quick and it did lend itself to modification. Being unfamiliar with Deeplab v3, it took some time to understand what the main.py is doing. Eventually, I understood what was going on and could try some things. This implementation uses a Keras Implementation of Deeplab v3 plus.

The first thing to understand is that Deeplab v3 operates on square images 512x512. That is why the image is resized on 512 and why the padding.

Secondly, Deeplab v3 works with matrices with values in the range from -1 to 1. That explains line 27:

resized = resized / 127.5 - 1.

I still do not fully understand what happens at lines 30 and 31:

res = deeplab_model.predict(np.expand_dims(resized2, 0))        labels = np.argmax(res.squeeze(), -1)

Model.predict I expect returns the segmented map. I don't know exactly what squeeze() does but the result is a mask of what I expect 0’s outside the segmented area and some non-zero value from within the segmented region.

Lines 33–35 remove the padding from labels and sets mask to 1’s (True) where labels is 0 and 0's (False) everywhere else, thus a mask of the background. Mask_person is just the opposite, 1’s where the segmented region is and 0’s elsewhere, thus a mask of the foreground.

labels = labels[:-pad_x]
mask = labels == 0
mask_person = labels != 0

Then the original frame read from the webcam is resized to the same as labels.shape and some gaussian blurs are applied to the resized original frame

resizedFrame = cv2.resize(frame, (labels.shape[1], labels.shape[0]))      blur = cv2.GaussianBlur(resizedFrame, (blur_bg_value,blur_bg_value), 0)
blur_person = cv2.GaussianBlur(resizedFrame, blurValue, 0)

Lastly, the masks are used to apply the masked blur and masked person are set on the resized original frame. The resized frame is the final product with the background blurred.

resizedFrame[mask] = blur[mask]
resizedFrame[mask_person] = blur_person[mask_person]

My first attempt was to just remove the blur and replace it with black to make a kind of chroma key mask so that I'd have just the foreground. I just replaced line 38 with a zeros matrix with the same dimensions of resized frame:

blur = np.zeros(resizedFrame.shape)

This indeed left an image similar to the results shown in the project with a black background. Since this seemed to work, I changed this blur with a background image and ran again.

That indeed created a paste of the foreground over the background image. This is good, but the halo left behind was disturbing.

The final thing I tried to correct the halo was to look into the creation of the deep lab model.

deeplab_model = Deeplabv3(OS=8)

There is a commented out declaration above this…

# deeplab_model = Deeplabv3(backbone='xception', OS=8)

So I looked at the definition of this method in the model.py file. I tried using OS=16 as parameter and found that the halo effect was reduced. I can't say I noticed a slowdown in performance.

Since that improved the image, I then also added the backbone='xception' parameter. By default, backbone=‘mobilenetv2’. The first time running with those two parameters took quite a long time while it downloaded the selected backbone weights. Its performance was indeed slower but the halo effect was almost gone.

Conclusion

This has promise as an alternative to performing chroma key operations. The halo effect can be reduced to something that is not quite acceptable, but no potential colors presented in the foreground are lost.

But there are problems. Besides the halo effect, the biggest one in my mind is the size of the image needed for processing. Lots of detail is lost. In order to process large images one could consider using Super Resolution GAN to upscale the produced images in order to restore some of the lost resolution. This of course would require some heavy iron to do realtime. On my MacBook, even without srgan, it was not even close to realtime.

It might help to use a background screen for this as one would do so for chroma key. I worked in my environment so some foreground objects remained and depending on angle of the camera, it would pickup one of our robots as well as other people in the background.

As with many things in computer vision, the quality of the images used make a difference.

--

--

Emerging technologies & challenges

Thoughts about emerging technologies and some of the challenges related to them. The technology itself usually is not the problem (or the solution).