this dir | view | cards | source | edit | dark
top
Lecture
- computer vision
- mimic human vision
- extract measurable information
- identify objects
- fix/improve images to improve their interpretation
- image processing vs. computer vision
- processing: reasoning focused on the image, pixels, pixel groups
- vision: focuses on the knowledge the image brings from a real scene
- difficult problem
- the goal of computer vision is not to mimic human vision but to build systems that extract information
- computer vision is an inverse of the synthesis problem
- projection is fundamentally ambiguous – we are losing information (depth, size, occlusions)
- in general it's an ill-posed problem: no unique solution for a given observation, ambiguous solutions, incomplete data (example: scale of the observed scene, toy car instead of normal car)
- need of injecting a priori knowledge and regularization (example: penalize non-smooth solutions)
- from noisy observations, we estimate the parameters of a model
- a priori knowledge: physics, geometry, semantics
- example: by counting visible wheels of the car, we can tell the position of the camera
- desirable characteristics
- robustness – be able to identify observation noise/errors (have plan B in case of error)
- speed
- precision
- generality – the algorithm should be generic; the pool of situations that it can handle should be large enough
- main vision problems
- image
- 2D signal, depicts a 3D scene
- matrix of values that represent a signal
- has semantic information
- light
- plays a fundamental role in 3D perception
- no light → no image
- diffuse reflection, shadow, specular reflection
- wavelength, spectrums
- solar spectrum: almost continuous spectrum, some wavelengths are stronger
- white light: continuous spectrum (energy evenly distributed)
- sodium vapor lamp: only yellow → red car appears dark
- what happens when a light ray hits the surface
- absorption (black surface)
- reflection, refraction (mirror, reflector)
- diffusion (milk)
- fluorescence
- transmission and emission (human skin)
- most surfaces can be approximated by simple models
- first simplified hypothesis
- standard model: BRDF
- bi-directional reflectance distribution function
- models the ratio of energy for each wavelength λ
- incoming from direction v^i
- emitted towards direction v^r
- n^ … normal vector
- reciprocity
- isotropy
- energy corresponds to an integral (we can use a discrete sum)
- Lambert assumption
- diffuse surface: uniform in all directions (paper, milk, matt paint)
- the BRDF is a constant function fd(v^i,v^r,n^,λ)=fd(λ)
- specular material, central lobe on s^i
- Phong … fs(θs,λ)=ks(λ)coskeθs
- Torrence-Sparrow
- di-chromatic
- pipeline in a digital camera
- to get RAW
- optics → aperture → shutter → sensor → gain → A/D
- to get JPEG from RAW
- demosaic → sharpen → white balance → gamma/curve → compress
- optical role: isolate the light rays (from one particular part of the scene)
- we can model a complicated system of lenses using just one lens
- perfect lens: hypothesis
- a point in the scene corresponds to a point in an image
- this is not true, there are artifacts
- chromatic artifacts (fringing)
- diffraction – wavelength dependent
- vignetting … border of the image is darker
- geometric distortion (for wide-angle cameras)
- CCD sensor, CMOS sensor
- rolling shutter
- color spaces
- RGB … additive
- CMY … subtractive
- color perception
- retina
- fovea
- rods – achromatic perception of lights, pigmentation (rhodopsin) is sensitive to all visible spectrum (peak on green)
- cones – color perception
- mantis shrimp has the most complex visual system ever discovered
- color perception in a camera
- deviation/dispersion prism
- 3 CCD sensors
- precise alignment, high quality filter
- expensive
- Bayer filter
- individual (plastic) filter for each pixel (RGGB, RGCB)
- to get colors in each pixel, we interpolate (integrate over spectrums)
- sensor artifacts
- noise: salt and pepper, thermic noise (as the camera heats up)
- aliasing
- gamma correction
- JPEG compression artifacts