What Can We Learn From Computer Vision?

Cynbe ru Taren

This essay was written immediately after a bout of researching theoretical and applied computer vision. It was primarily intended to digest and consolidate the results in my own mind; I've posted it for those bright enough to prefer re-use to re-invention. Yeah, both of you. *wrygrin*

Computer vision has exploded over the last decade: Previously intractable problems are now assigned as undergraduate projects.

This great leap forward in computer vision has been driven by the adoption of breakthrough algorithms and datastructures like RANSAC, boosting, mean shift, graph cut, loopy belief propagation, histogram techniques, particle filters, Markov random fields and Markov-Chain Monte Carlo (MCMC) techniques.

Most of the new techniques are quite general — in fact borrowed from fields like theoretical physics — and promise to have equally revolutionary effect when applied in other fields, yet are virtually unknown in the programming mainstream. It is no exaggeration to call them the wave of the programming future.

In particular, the new efficient graph optimization algorithms open up entirely new horizons.

Perhaps the most spectacular single paper in this recent efflorescence is Globally optimal solutions for energy minimization in stereo vision using reweighted belief propagation in which the authors exactly solve real-world examples of NP-hard problems in a matter of minutes using these techniques.

Contemporary computer science programs mainly teach you how to iterate over arrays and recurse over trees and then throw you out into the world; consequently 99% of contemporary software consists of nothing but those same two tired old 1950s techniques applied over and over and over again.

Add these new techniques to your toolbox and you will take your programming to an entirely new level.

Core take-home lessons:

Enough airy-fairy handwaving — let's talk bits:

And so forth; those are the pick of the litter, but you'll find a number of additional wheels you can steal on my computer vision papers by topic page.

Code: The most popular open source computer vision library is the Intel-donated OpenCV ("Open Computer Vision") library, which has a wikipedia entry, a sourceForge webpage, FAQ and wiki, and a very active user mailing list mirrored here at gmane. (The much less active developer's mailing list is here)

The University of Westen Ontario Vision Group has some code available which includes a generic max-flow/min-cut library downloadable for research purposes only :-( and Olga Veksler's Multi-label Markov Random Field optimization code available under GPL2 :-) -- see her thesis for background.

My own little Gtk-based Linux vision app is here.


[1] Face detection and simple stereo image matching, for example, are no longer considered interesting in the absence of special problems such as occlusions, nor is detecting rigid objects like cars. Recognizing flexible objects like pedestrians and faces remains a research topic.

Cynbe ru Taren
Last modified: Mon Jul 7 12:30:08 CDT 2008