fix/improve images to improve their interpretation
image processing vs. computer vision
processing: reasoning focused on the image, pixels, pixel groups
vision: focuses on the knowledge the image brings from a real scene
difficult problem
the goal of computer vision is not to mimic human vision but to build systems that extract information
computer vision is an inverse of the synthesis problem
projection is fundamentally ambiguous – we are losing information (depth, size, occlusions)
in general it's an ill-posed problem: no unique solution for a given observation, ambiguous solutions, incomplete data (example: scale of the observed scene, toy car instead of normal car)
need of injecting a priori knowledge and regularization (example: penalize non-smooth solutions)
from noisy observations, we estimate the parameters of a model
a priori knowledge: physics, geometry, semantics
example: by counting visible wheels of the car, we can tell the position of the camera
desirable characteristics
robustness – be able to identify observation noise/errors (have plan B in case of error)
speed
precision
generality – the algorithm should be generic; the pool of situations that it can handle should be large enough
image
2D signal, depicts a 3D scene
matrix of values that represent a signal
has semantic information
light
plays a fundamental role in 3D perception
no light → no image
diffuse reflection, shadow, specular reflection
wavelength, spectrums
solar spectrum: almost continuous spectrum, some wavelengths are stronger
white light: continuous spectrum (energy evenly distributed)
sodium vapor lamp: only yellow → red car appears dark
what happens when a light ray hits the surface
absorption (black surface)
reflection, refraction (mirror, reflector)
diffusion (milk)
fluorescence
transmission and emission (human skin)
most surfaces can be approximated by simple models
first simplified hypothesis
standard model: BRDF
bi-directional reflectance distribution function
models the ratio of energy for each wavelength λ
incoming from direction v^i
emitted towards direction v^r
n^ … normal vector
reciprocity
isotropy
energy corresponds to an integral (we can use a discrete sum)
Lambert assumption
diffuse surface: uniform in all directions (paper, milk, matt paint)
the BRDF is a constant function fd(v^i,v^r,n^,λ)=fd(λ)
specular material, central lobe on s^i
Phong … fs(θs,λ)=ks(λ)coskeθs
Torrence-Sparrow
di-chromatic
diffuse + specular
pipeline in a digital camera
to get RAW
optics → aperture → shutter → sensor → gain → A/D
to get JPEG from RAW
demosaic → sharpen → white balance → gamma/curve → compress
optical role: isolate the light rays (from one particular part of the scene)
we can model a complicated system of lenses using just one lens
perfect lens: hypothesis
a point in the scene corresponds to a point in an image
this is not true, there are artifacts
chromatic artifacts (fringing)
diffraction – wavelength dependent
vignetting … border of the image is darker
geometric distortion (for wide-angle cameras)
CCD sensor, CMOS sensor
rolling shutter
color spaces
RGB … additive
CMY … subtractive
color perception
retina
fovea
rods – achromatic perception of lights, pigmentation (rhodopsin) is sensitive to all visible spectrum (peak on green)
cones – color perception
mantis shrimp has the most complex visual system ever discovered
color perception in a camera
deviation/dispersion prism
3 CCD sensors
precise alignment, high quality filter
expensive
Bayer filter
individual (plastic) filter for each pixel (RGGB, RGCB)
to get colors in each pixel, we interpolate (integrate over spectrums)
sensor artifacts
noise: salt and pepper, thermic noise (as the camera heats up)
aliasing
gamma correction
JPEG compression artifacts
Hurá, máš hotovo! 🎉 Pokud ti moje kartičky pomohly, můžeš mi koupit pivo.