What Can We Learn From Computer Vision?

Cynbe ru Taren

This essay was written immediately after a bout of researching theoretical and applied computer vision. It was primarily intended to digest and consolidate the results in my own mind; I've posted it for those bright enough to prefer re-use to re-invention. Yeah, both of you. *wrygrin*

Computer vision has exploded over the last decade: Previously intractable problems are now assigned as undergraduate projects.

This great leap forward in computer vision has been driven by the adoption of breakthrough algorithms and datastructures like RANSAC, boosting, mean shift, graph cut, loopy belief propagation, histogram techniques, particle filters, Markov random fields and Markov-Chain Monte Carlo (MCMC) techniques.

Most of the new techniques are quite general — in fact borrowed from fields like theoretical physics — and promise to have equally revolutionary effect when applied in other fields, yet are virtually unknown in the programming mainstream. It is no exaggeration to call them the wave of the programming future.

In particular, the new efficient graph optimization algorithms open up entirely new horizons.

Perhaps the most spectacular single paper in this recent efflorescence is Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation in which the authors exactly solve real-world examples of NP-hard problems in a matter of minutes using these techniques.

Contemporary computer science programs mainly teach you how to iterate over arrays and recurse over trees and then throw you out into the world; consequently 99% of contemporary software consists of nothing but those same two tired old 1950s techniques applied over and over and over again.

Add these new techniques to your toolbox and you will take your programming to an entirely new level.

Core take-home lessons:

If you're not finding approximate solutions to NP-hard graph problems, you're probably not solving an interesting real-world problem.
Just about every worthwhile real-world problem reduces to something NP-hard; if you want to solve real-world problems, you need to become adept at efficiently finding approximate solutions to them.
That's old news; the new news is that there are now well-tested off-the-shelf tools for doing so.
Just because a problem class is NP-hard does not mean that the particular problem at hand is not exactly solvable.
It is often — even usually — a very small minority of problems in the class, untypical of real-world problems, that make the class NP-hard.
Even when the specific problem at hand cannot be solved exactly, sufficiently good approximations are usually available.
Recent computer research has unveiled a succession of steadily more effective algorithms for attacking such problems. For example, the key min-cut graph problem has been successively solved using simulated annealing, ncut, "jump move" graph cut, "swap move" graph cut, "expansion move" graph cut, belief propagation, loopy belief propagation, tree re-weighted message passing, convergent tree re-weighted message passing, serial tree re-weighted message passing, Swendsen-Wang, and the isoperimetric algorithm, to name just the best-known algorithms.
Each new technique has brought new real-world problems within reach of practical solution.
Most of the recent progress in computer vision can be traced to a handful of very general, very powerful new programming techniques.
Some were developed by the physics community attack to hard problems in, for example, lattice physics.
These techniques have made many formerly intractable vision problems so easy as to no longer be interesting.[1]
They can do the same thing for your programming problems. Make these techniques part of your everyday programming toolkit and a new world will open up for you. (Yes, Virginia, there's more to hacking than OOP.)

Enough airy-fairy handwaving — let's talk bits:

Boosting: Suppose you have a spam filter algorithm which, given a set of spam and nonspam emails, can construct a rule to distinguish between them — but it is only about 60% accurate. Pretty useless, right?
Wrong! Just feed it into AdaBoost (Adaptive Boost), and it will hand you back in return a "boosted" version which is 99% accurate or better.
AdaBoost is your automatic algorithm massage machine:
"Put in a coin, turn out the light,
magic fingers make you feel alright."
—Jimmy Buffet

Since it is usually much easier to write a weak learner to produce 60% accurate rules than a strong learner to produce 99% accurate ones, AdaBoost is an invaluable addition to your bag of tricks.
Read all about it here.
Mean Shift: Suppose someone drops in your lap a file of a million records and asks you if there's any interesting pattern to their distribution. They might be pixels in an image, hostile probes of your firewall, mating times of rabbits, gamma ray burst arrival directions, whatever. What do you do?
If it is a one-time thing, you probably scatterplot them and eyeball the result — the human visual system is still unbeatable. But if this is going to happen once a minute all year long that could get boring real fast. What then?
You treat them as points in some Euclidean space and you ask Mean Shift to cluster them. Mean Shift doesn't require that clusters be elliptical or Gaussian or linearly separable or anything else; it takes any dataset you throw at it and spits back at you something you can use. It takes a lickin' and keeps on tickin' and it's finger-lickin' good.
Get the complete story here.
Particle Filtering: OK, so you learned Branch And Bound and Alpha-Beta Minimax Search With Deep Cutoffs in college. Be honest: When was the last time you actually used one of them outside of homework?
Probably never: They were designed for games of perfect information, and real-life applications aren't like that: Real-life situations are all about coping with incomplete information and uncertainty.
But particle filter techniques take to those situations like ducks to water.
For example, suppose you've got a hazard warning sonar or radar or video system peering into the dark trying to pick real objects out of the clutter as early as humanly possible. The life you save might be your own. What do you do? You probably call in sick. But it doesn't have to be that way.
Particle filtering techniques let you simultaneously track hundreds or thousands or millions of hypotheses as to what might be out there, incrementally updating them as each scrap of information comes in, with some of them gradually evaporating as contrary information accumulates and some gradually congealing into near certainties.
You can't handle that sort of problem with regexes and hash tables, but with particle filters in your toolbox it's all in a day's work.
It feels a bit like working with superpositions of quantum states, but you won't need liquid helium or a hard vacuum system, just a modern code library at your fingertips.
Want to know more? Right this way.
Histogram techniques are a pervasive theme in recent computer vision work. Histogram techniques are to analysing realistic datasets what fractals are to synthesizing realistic datasets.
Remember when you first really grasped that by using fractals you could synthesize in minutes images and datasets that had seemed humanly impossible using classical ruler-and-compass polygon style of graphics? Yowza!!
Histogram techniques are like that for analysing real-world datasets: Suddenly stuff that seemed completely impractical becomes fun to solve, and then shortly too easy to be interesting.
Histogram techniques rock because they make absolutely no assumptions about what the incoming data is going to look like. They don't require that it be Gaussian or Poisson or unimodal or anything. "Hypotheses non fingo."
This is good because real-world data usually don't do what they spozed to. "The difference between theory and practice is that in theory they are the same, but in practice they are different."
Precisely because they make no assumptions and impose no requirements, histogram techniques are totally general tools you can throw at any problem that come through for you every time, dependable as death and taxes, implacable as gravity. When you don't want a phone call at 3 AM on vacation in Fiji asking why your code jammed on that day's data, histogram techniques are Your Friend.
Histogram techniques first became popular as a way to spot "anything different" happening in live video of stuff like swaying trees, waving grass, flowing water, running escalators &tc, without taking forever to individually model all those background features. They do a spectacularly good job at that. (If you have a webcam, run my vision app and see for yourself.)
But there's nothing particularly vision-specific about them; you can histogram anything you can count, so once you grasp how to apply them, you've off to the races.
If you're working with real-world data (is there another kind?) you can't afford not to have this tool in your bag of tricks.
Read more here.
RANSAC: Probably the first thing you learned in statistics was how to fit a line to data points using least squares.
Probably the second thing you learned in statistics is that this doesn't work on real-world data because the outliers kill you.
RANSAC ("RANdom SAmple Consensus") was the first of the recent wave of robust regression algorithms that let you reliably pull that line out of the clutter even when 80% or more of the datapoints are outliers all over the map. As long as there's a solid core of inliers, RANSAC will dive into that cold sea of outliers, find that line, and come dashing back with it, wagging its tail and eager to go again.
Nowadays, about the only place you see least-squares fitting is in old textbooks; out in real-world coding, RANSAC and its children have almost totally displaced it.
About RANSAC's only weak point is that it won't notice if there are actually multiple lines in the dataset — it will pick one at random and return it. It that's a problem, you need to do something like run Mean Shift first to pick out the line clusters and then run RANSAC on the resulting clusters one at a time. Or call it multiple times and histogram the results or use a Hough transform or whatever; do keep multiple arrows in your quiver.
There have been a number of tweaks to the RANSAC algorithm to improve its performance in various marginal cases, but the original algorithm remains popular.
Don't make the mistake of pigeon-holing RANSAC as specifically a 2D line-fitting algorithm! Once you get the idea, you can use it to fit ellipses in 2D, Corvettes in 3D, D-branes in 10-dimensional string space, whatever. Internalize the core idea, apply it inventively and it will save you endless grief.
RANSAC isn't sexy; it's just one of those tools that keeps simple problems simple. You plug it in and forget about it. It's the synchronous electric motor of the robust algorithms world, humming away quietly in the background day in and day out.
Read up on it here.
Graph Cut: Suppose your boss wants code to take stereo pairs of pictures, find best matches, and deduce distances from the result. It's NP hard. What do you do? You reach for Graph Cut, your all-purpose Swiss Army Knife NP-hard approximate best match finder.
Graph Cut and kin are the secret sauce in the best of the new-wave computer vision recipes. If it makes your eyes bug out the first time you see it, it is probably using a Graph Cut family algorithm under the hood. Learn how to use them and you too can amaze your friends!
The great thing is that these algorithms are completely general; if you can express it as a graph, you can apply them. You can use them to stitch snapshots together into panoramas, to construct a 3D model of a room from a handheld home movie, to find computer dating matchups, to automatically stitch new foreground objects into video, to remove someone from a photo and automatically inpaint the hole so neatly nobody can tell it was modified, to allocate machine tools to incoming work orders, to recognize continuous speach, whatever.
Graph Cut is what you use to solve the problems they told you in college couldn't be solved. It's the big gun you pull out when nothing else in in your bag of tricks will do the job. When the going gets tough, the tough pull out Graph Cut. Learn how to use it with flair and imagination and your friends really will think you're a miracle worker.
Sound good? Read about it here.

And so forth; those are the pick of the litter, but you'll find a number of additional wheels you can steal on my computer vision papers by topic page.

Code: The most popular open source computer vision library is the Intel-donated OpenCV ("Open Computer Vision") library, which has a wikipedia entry, a sourceForge webpage, FAQ and wiki, and a very active user mailing list mirrored here at gmane. (The much less active developer's mailing list is here)

The University of Westen Ontario Vision Group has some code available which includes a generic max-flow/min-cut library downloadable for research purposes only :-( and Olga Veksler's Multi-label Markov Random Field optimization code available under GPL2 :-) -- see her thesis for background.

My own little Gtk-based Linux vision app is here.

Footnotes.

[1] Face detection and simple stereo image matching, for example, are no longer considered interesting in the absence of special problems such as occlusions, nor is detecting rigid objects like cars. Recognizing flexible objects like pedestrians and faces remains a research topic.

Cynbe ru Taren

Last modified: Mon Jul 7 12:30:08 CDT 2008