Main idea: It uses a generative model based on epitomic image analysis. This analysis is based in a probabilistic approach.
- The appearance and geometric structure of the environment is captured into this representation.
- It has the ability to model translation and scale invariance together with the fusion of diverse visual features yield enhanced generalization with economical training.
- The recognition of a location class is achieved by convolving the query image and the learned epitome.
- It doesnt estimate the accurately the camera position.
- Occlusions, reflections or non-rigid motions are modeled as noise whose variance changes for different regions within the environment.
- These epitomes are generative, probabilistic models and various sources of uncertainty are captured in the variance maps.
- In this model, an image is extracted from a larger latent image, the epitome, at a location given by a discret mapping.
- Every N x M image I is generated from a Ne x Me location epitome e.
- In this approach Ne x Me translations and 3 scales are considered.
Inference and learning
- Every image is independient and identically distributed given the epitome.
- The goal is to find a single epitome e* which maximizes the probability of the observations. This is achivied by EM algorithm.
Visual features
- raw RGB pixels
- gist features
- disparity maps
- local histograms
It uses a stereo camera.
No comments:
Post a Comment