Lecture: kartičky | computer-vision

computer vision

image processing vs. computer vision

difficult problem

the goal of computer vision is not to mimic human vision but to build systems that extract information
computer vision is an inverse of the synthesis problem
- projection is fundamentally ambiguous – we are losing information (depth, size, occlusions)
- in general it's an ill-posed problem: no unique solution for a given observation, ambiguous solutions, incomplete data (example: scale of the observed scene, toy car instead of normal car)
- need of injecting a priori knowledge and regularization (example: penalize non-smooth solutions)
from noisy observations, we estimate the parameters of a model
a priori knowledge: physics, geometry, semantics
example: by counting visible wheels of the car, we can tell the position of the camera
desirable characteristics
- robustness – be able to identify observation noise/errors (have plan B in case of error)
- speed
- precision
- generality – the algorithm should be generic; the pool of situations that it can handle should be large enough

image

light

plays a fundamental role in 3D perception
no light → no image
diffuse reflection, shadow, specular reflection
wavelength, spectrums
- solar spectrum: almost continuous spectrum, some wavelengths are stronger
- white light: continuous spectrum (energy evenly distributed)
- sodium vapor lamp: only yellow → red car appears dark
what happens when a light ray hits the surface
- absorption (black surface)
- reflection, refraction (mirror, reflector)
- diffusion (milk)
- fluorescence
- transmission and emission (human skin)
most surfaces can be approximated by simple models
first simplified hypothesis
standard model: BRDF
- bi-directional reflectance distribution function
- models the ratio of energy for each wavelength $\lambda$ $λ$
  - incoming from direction $\hat v_i$
  - emitted towards direction $\hat v_r$
  - $\hat n$ … normal vector
- reciprocity
- isotropy
- energy corresponds to an integral (we can use a discrete sum)
- Lambert assumption
  - diffuse surface: uniform in all directions (paper, milk, matt paint)
  - the BRDF is a constant function $f_d(\hat v_i,\hat v_r,\hat n,\lambda)=f_d(\lambda)$
- specular material, central lobe on $\hat s_i$ $\overset{s}{^}_{i}$
  - Phong … $f_s(\theta_s,\lambda)=k_s(\lambda)\cos ^{k_e}\theta_s$
  - Torrence-Sparrow
- di-chromatic
  - diffuse + specular

pipeline in a digital camera

to get RAW
- optics → aperture → shutter → sensor → gain → A/D
to get JPEG from RAW
- demosaic → sharpen → white balance → gamma/curve → compress
optical role: isolate the light rays (from one particular part of the scene)
we can model a complicated system of lenses using just one lens
perfect lens: hypothesis
- a point in the scene corresponds to a point in an image
- this is not true, there are artifacts
chromatic artifacts (fringing)
- diffraction – wavelength dependent
vignetting … border of the image is darker
geometric distortion (for wide-angle cameras)
CCD sensor, CMOS sensor
rolling shutter
color spaces
- RGB … additive
- CMY … subtractive
color perception
- retina
- fovea
- rods – achromatic perception of lights, pigmentation (rhodopsin) is sensitive to all visible spectrum (peak on green)
- cones – color perception
- mantis shrimp has the most complex visual system ever discovered
color perception in a camera
- deviation/dispersion prism
  - 3 CCD sensors
  - precise alignment, high quality filter
  - expensive
- Bayer filter
  - individual (plastic) filter for each pixel (RGGB, RGCB)
  - to get colors in each pixel, we interpolate (integrate over spectrums)
sensor artifacts
- noise: salt and pepper, thermic noise (as the camera heats up)
- aliasing
gamma correction
JPEG compression artifacts