this dir | view | cards | source | edit | dark top

Lecture

computer vision
- mimic human vision
- extract measurable information
- identify objects
- fix/improve images to improve their interpretation
image processing vs. computer vision
- processing: reasoning focused on the image, pixels, pixel groups
- vision: focuses on the knowledge the image brings from a real scene
difficult problem
- the goal of computer vision is not to mimic human vision but to build systems that extract information
- computer vision is an inverse of the synthesis problem
  - projection is fundamentally ambiguous – we are losing information (depth, size, occlusions)
  - in general it's an ill-posed problem: no unique solution for a given observation, ambiguous solutions, incomplete data (example: scale of the observed scene, toy car instead of normal car)
  - need of injecting a priori knowledge and regularization (example: penalize non-smooth solutions)
- from noisy observations, we estimate the parameters of a model
- a priori knowledge: physics, geometry, semantics
- example: by counting visible wheels of the car, we can tell the position of the camera
- desirable characteristics
  - robustness – be able to identify observation noise/errors (have plan B in case of error)
  - speed
  - precision
  - generality – the algorithm should be generic; the pool of situations that it can handle should be large enough
main vision problems
image
- 2D signal, depicts a 3D scene
- matrix of values that represent a signal
- has semantic information
light
- plays a fundamental role in 3D perception
- no light → no image
- diffuse reflection, shadow, specular reflection
- wavelength, spectrums
  - solar spectrum: almost continuous spectrum, some wavelengths are stronger
  - white light: continuous spectrum (energy evenly distributed)
  - sodium vapor lamp: only yellow → red car appears dark
- what happens when a light ray hits the surface
  - absorption (black surface)
  - reflection, refraction (mirror, reflector)
  - diffusion (milk)
  - fluorescence
  - transmission and emission (human skin)
- most surfaces can be approximated by simple models
- first simplified hypothesis
- standard model: BRDF
  - bi-directional reflectance distribution function
  - models the ratio of energy for each wavelength $\lambda$ $λ$
    - incoming from direction $\hat v_i$
    - emitted towards direction $\hat v_r$
    - $\hat n$ … normal vector
  - reciprocity
  - isotropy
  - energy corresponds to an integral (we can use a discrete sum)
  - Lambert assumption
    - diffuse surface: uniform in all directions (paper, milk, matt paint)
    - the BRDF is a constant function $f_d(\hat v_i,\hat v_r,\hat n,\lambda)=f_d(\lambda)$
  - specular material, central lobe on $\hat s_i$ $\overset{s}{^}_{i}$
    - Phong … $f_s(\theta_s,\lambda)=k_s(\lambda)\cos ^{k_e}\theta_s$
    - Torrence-Sparrow
  - di-chromatic
    - diffuse + specular
pipeline in a digital camera
- to get RAW
  - optics → aperture → shutter → sensor → gain → A/D
- to get JPEG from RAW
  - demosaic → sharpen → white balance → gamma/curve → compress
- optical role: isolate the light rays (from one particular part of the scene)
- we can model a complicated system of lenses using just one lens
- perfect lens: hypothesis
  - a point in the scene corresponds to a point in an image
  - this is not true, there are artifacts
- chromatic artifacts (fringing)
  - diffraction – wavelength dependent
- vignetting … border of the image is darker
- geometric distortion (for wide-angle cameras)
- CCD sensor, CMOS sensor
- rolling shutter
- color spaces
  - RGB … additive
  - CMY … subtractive
- color perception
  - retina
  - fovea
  - rods – achromatic perception of lights, pigmentation (rhodopsin) is sensitive to all visible spectrum (peak on green)
  - cones – color perception
  - mantis shrimp has the most complex visual system ever discovered
- color perception in a camera
  - deviation/dispersion prism
    - 3 CCD sensors
    - precise alignment, high quality filter
    - expensive
  - Bayer filter
    - individual (plastic) filter for each pixel (RGGB, RGCB)
    - to get colors in each pixel, we interpolate (integrate over spectrums)
- sensor artifacts
  - noise: salt and pepper, thermic noise (as the camera heats up)
  - aliasing
- gamma correction
- JPEG compression artifacts