CS 180 Project 1

Name: Brandon Wong

Overview

The goal of project 1 is to convert images taken with 3 separate color channels taken by Prokudin-Gorskii in the 20th century into a singular rgb color image with minimal leftover issues from converting several different images into one. Because the images were taken before color photos could be taken, they take the form of 3 separate images that need to be aligned and cleaned up with various computer vision techniques to convert them into modern color images.

Alignment

Splitting

Most of the images are split in one way. First, the original image is loaded from the file. Then, this original image is cropped by two-thirds of a percent on each side. After that, the separate channel images are split apart by taking the first third, second third, and last third of the original image and assigning them to blue, green, and red respectively. This channel images are then cropped by five percent on each side to obtain the final blue, green, and red channel images.

There is one exception to this. The cathedral image isn't cropped at all before being aligned. This is the original method I used for all the .jpg images, but I found that this caused issues for the monastery image so I stuck with the method I used for .tif images for all of the images except the cathedral image, where cropping caused issues instead. As such, the cathedral image is the sole image that does not get cropped before alignment. Margins are instead removed after processing, which is already done anyways for all the jpg images.

Aligning

The channels are aligned by finding the L2 norm, also known as the Euclidian Distance, of the red and blue images individually with the green image.

In the case of the .jpg images, different paddings using np.pad are applied to the red and blue images before their Euclidean Distance with the green image is found. The paddings that result in the smallest L2 norm are the ones selected to be used in the final alignment, and then the margins are removed from the final image to keep just the aligned parts in the final result.

In the case of the .tif images, pyramid alignment was used with rolling instead of padding the images. Using np.roll, the optimal alignments based on the L2 norm first on images 1/64 of the original size, then on images 1/16 of the original size and aligned based on the result from the 1/64 size alignment, then on images 1/4 of the size based on the results from the 1/64 and 1/16 alignments are found and applied to the original image to ensure the optimal alignments for the red and blue images with the green image can be found in a timely manner. Trying to find the alignment with the optimal L2 norm on the images at full size and half size were found to take to long to be reasonable, while using all three smaller sizes takes at most five minutes total per alignment. By basing the final alignment based on the total results from the three smaller ones, and optimal point can be found within a reasonable time frame for each image with no noticeable off alignments the majority of the time.

Issues

While most images ended up aligning perfectly, a small number of images were still off after the alignments were done. The most distinctly off image was the self_portrait images. The train, emir, and lady images were also off, but to a lesser extent that isn't as distinct and obvious as the first one.

I believe these issues stem primarily from the alignment being unable to reach the precision needed to avoid being slightly off on images with fewer issues. Because even the highest level of precision in still as a four times reduced size, being slightly off should not be too unexpected. This may be the issue for the images that are more disticntly wrong as well, except in a slightly different way. In their cases, the alignment goes too far on the smaller sizes, and when they move up to the larger sizes, the alignment rolling no longer has the range to find the true optimal point, as the smaller sizes adjust the images by four times as much as the next size up. This results in alignments that prevent the final one from ever finding the true optimal value, ending up with final images that are off.