Home Previous: Automatic Panoramic Mosaic Stitching

CSC5280 Project 4: Seam Carving for Content-Aware Image Resizing


origin(500×488)	scale(360×352)	crop(360×352)	retarget(360×352)

Once I picked out one photo as my "portrait" in my blog. Unfortunately, "I" become a bit small after scaling the image. One possible solution is to crop the image instead of scaling it. However, I'm not satisfied with this result since cropping leaves out too much content, for example, some background is lost. If only I have a Content-Aware Image Resizing algorithm. And the good thing is that Shai Avidan and Ariel Shamir have proposed one way and it is easy to implement. The "retargeted" resized image using this algorithm is pretty well (the forth image above).

Introduction

In this project, I implement Shai Avidan and Ariel Shamir's Seam Carving for Content-Aware Image Resizing (SIGGRAPH2007) completely and test my implementation using the same images they used. I test its performance in image resizing, content amplification (content retargeting) and object removing.

Next, I'll give the details in my implementation to make readers easy to repeat my experiments, and then some results are presented. A interesting game will serve as the closure.

Binary File

Code

Results

Terms

1. Seam

Let I be an n×m imageand theVertical Seam is defined as:

where x is an mapping x: [0,1,...,n-1]→[0,1,...,m-1]. So, a vertical seam is an 8-connected path in the image from the top to the bottom containing one pixel per row. Similerly, if y is another mapping y: [0,1,...,m-1]→[0,1,...,n-1], then Horizontal Seam is defined as:

2. Energy Function

Here, we define energy function as a kind of function that are related with the content in an image. One intuitive way is to use the gradient, since the important contents in an image usually have strong local structures whose gradients are big, while contents that are less important often lie in plain area in the image whose gradients are small.

Shai Avidan and Ariel Shamir also proposed some more sophisticated energy function, such as the ratio between gradient and the histogram of gradients (HoG), entopy, segmentation and so on. My experiments tell that gradient energy function performs well enough. Though it is expected that different energy functions have their own matched images as Shai Avidan and Ariel Shamir suggested, I find that gradient is supposed to be the best choice considering its easy computing and fair performance.

3. Seam Cost

Given an energy function, the seam cost is defind as:

where I(si) is one pixel of the seam. our goal is to find the optimal seam that minimizes seam cost:

This optimal seam can be found by dynamic programming. In finding the optimal vertical seam, we tranverse the image firstly,

then we backtrack the image from the minimal entry in the last row of the map. It is simlar for horizontal seam.

***In my implementation, only optimal horizontal seam finding function is written, because we can transpose the image and use the same function to find optimal vertical seam.

Image Resizing

Traditional image reszing regards all the pixels equally, using up or down sampling to resize the image. Now that the optimal seam, which has the smallest cost, must pass through the plain area of the image, so we can make resizing some intellegent by doing sampling around this seam part while keeping the important content unchanged. Assume we want to change the image ratio of an image from n×m to n×m' where m-m'=c. This can be achieved by successively removing c vertical seams from the original image.

Things become a bit different when we go to more general situations. If we want to resize image from n×m to n'×m' where n-n'=r and m-m'=c, we need to find some way to determ the order of processing on horizontal and vertical seams. The authors proposed one way:

where k=r+c and is used as a parameter that determine if at step i we remove a horizontal or vertical seam: . The solution can also be found by dynamic programming:

where T(0,0)=0, and each entry of the map T record the optimal vertical or horizontal seam. This rule to find optimal vertical-or-horizontal order needs a lot of computation. I compared it to the simple rule that we do image resizing on rows and then columns. Fortunately there are not too much differences. The code provided above using the simple "first row then column" rule. I also provide the dirty code of the complex rule here, which must need many improvements :-).

Besides, there are some differences in image enlarging and shrinking. We can get good results by successively removing seams to shrink images, but we'd better find enough seams to duplicate them simultaneously to enlarge images, that is, we find the first k optimal seams instead of only one most optimal seam. We do this to reduce artifacts, because if we find one optimal seam and duplicate it, often we'll get the same seam the next time.

Content Amplification

Suppose our content-aware image resizing algorithm performs well enough, then we can enlarge the image using the traditional way and shrink it to the original size using content-aware image reszing. Now we are deleting actually subpixel seams. Then, the result would be amplified importent content, say foreground, and compressed background. See the forth image above.

Object Removing

Also, if we design an interactive interface for users, and they select areas to be removed, we can get it easily by define very low energy on pixels in these areas, do seam carving and then enlarge it back to the original size. Though it sounds too simple, it works pretty well.

Experiments

I test my implementation by using the same images that the authors used. And though they used gradient and HoG energy functions on different images, I find my implementation using gradient has comparative results to theirs.

1. Image Resizing

origin	optimal seams	my result	Shai Avidan and Ariel Shamir's result

When extreme enlarge occures, artifacts can not be avoid. And it may equals to traditioanal resizing when m'>2m (assum we map image from n×m to n×m'). To keep the advantage of content-aware image resizing, at most 0.5m colums or 0.5n rows are allowed to be duplicated one time——this is controled by the parameter "FractionContolForEnlarge" in the code. One example is shown here.

original Image(239×200) (To enlarge it to 479×200, we should do enlarge twice)
the first time resizing(359×200)
the second time resizing(479×200)
Left: standard method Right: Shai Avidan and Ariel Shamir's result

Shai Avidan and Ariel Shamir's work is on image resizing in one dimension only. More results are shown in the results, including extreme shrink, shrink/enlarge in both dimensions and shrink in one dimension and enlarge in the other.

2. Content Amplification

Recall that to amplify the content, we enlarge the image using standard method, then shrink it using content-aware image resizing.

Left: origin(307×230)

Middle: standard Enlarg(350×262)

Right: Shai Avidan and Ariel Shamir's resut(307×230)

Left: horizontal seams for shrinking

Middle: vertical seams for shrinking

Right: my result

Another example is my photos shown in the beginning.

Note that this function is based on the assumption that our energy function can find contents that we are indeed aware, but in practice it is not so all the time. Fortunately, some strokes will be good enough for our algorithms. Here are they:

3. Object Removing

This part is even more fun. We need an interface to select areas that we want to remove, assign these parts with very little energy, shrink the image and then enlarge it back to the original size.

origin	girl mask	pigeon mask	retain mask

My Results
Shai Avidan and Ariel Shamir's resuts

Note the forth column is a typical image "retarget". We mark the interesting area, and get small image still containing them.

Some times objects are close to each other, so we cannot remove any of them only by remove mask. We need use both remove and retain mask.

origin	remove mask (red) and retain mask (green)	my result	authors' resuts

Extension

One natural extension of this work is extend it to video resizing and content-based video compression. I find that the authors have done this work in video resizing using similar method. Maybe one can think about the possibility to use seam carving in video compression, because we need only code the important content in the video assuming our seams can find most "important" content. Another usage is to create one "dynamic image" file format to record the multi-size image mentioned in the end of the paper, to make this algorithm suit for different displays, such as mobile and desktop displays.

External Link

[1] Seam Carving for Content-Aware Image Resizing

[2] Demo on YouTube

[2] The object removing need a user interface, here is a nice one: http://code.google.com/p/seam-carving-gui/

[3] Another implementation: http://brain.recall.googlepages.com/cair

Closure

OK, a little game here: Find the missing shoe!

origin	find the missing shoe :-)	find the missing show :-)

Here is the Answer!

Home Previous: Automatic Panoramic Mosaic Stitching