# Lecture

- computer vision
	- mimic human vision
	- extract measurable information
	- identify objects
	- fix/improve images to improve their interpretation
- image processing vs. computer vision
	- processing: reasoning focused on the image, pixels, pixel groups
	- vision: focuses on the knowledge the image brings from a real scene
- difficult problem
	- the goal of computer vision is not to mimic human vision but to build systems that extract information
	- computer vision is an inverse of the synthesis problem
		- projection is fundamentally ambiguous – we are losing information (depth, size, occlusions)
		- in general it's an ill-posed problem: no unique solution for a given observation, ambiguous solutions, incomplete data (example: scale of the observed scene, toy car instead of normal car)
		- need of injecting a priori knowledge and regularization (example: penalize non-smooth solutions)
	- from noisy observations, we estimate the parameters of a model
	- a priori knowledge: physics, geometry, semantics
	- example: by counting visible wheels of the car, we can tell the position of the camera
	- desirable characteristics
		- robustness – be able to identify observation noise/errors (have plan B in case of error)
		- speed
		- precision
		- generality – the algorithm should be generic; the pool of situations that it can handle should be large enough
- main vision problems
- image
	- 2D signal, depicts a 3D scene
	- matrix of values that represent a signal
	- has semantic information
- light
	- plays a fundamental role in 3D perception
	- no light → no image
	- diffuse reflection, shadow, specular reflection
	- wavelength, spectrums
		- solar spectrum: almost continuous spectrum, some wavelengths are stronger
		- white light: continuous spectrum (energy evenly distributed)
		- sodium vapor lamp: only yellow → red car appears dark
	- what happens when a light ray hits the surface
		- absorption (black surface)
		- reflection, refraction (mirror, reflector)
		- diffusion (milk)
		- fluorescence
		- transmission and emission (human skin)
	- most surfaces can be approximated by simple models
	- first simplified hypothesis
	- standard model: BRDF
		- bi-directional reflectance distribution function
		- models the ratio of energy for each wavelength $\lambda$
			- incoming from direction $\hat v_i$
			- emitted towards direction $\hat v_r$
			- $\hat n$ … normal vector
		- reciprocity
		- isotropy
		- energy corresponds to an integral (we can use a discrete sum)
		- Lambert assumption
			- diffuse surface: uniform in all directions (paper, milk, matt paint)
			- the BRDF is a constant function $f_d(\hat v_i,\hat v_r,\hat n,\lambda)=f_d(\lambda)$
		- specular material, central lobe on $\hat s_i$
			- Phong … $f_s(\theta_s,\lambda)=k_s(\lambda)\cos ^{k_e}\theta_s$
			- Torrence-Sparrow
		- di-chromatic
			- diffuse + specular
- pipeline in a digital camera
	- to get RAW
		- optics → aperture → shutter → sensor → gain → A/D
	- to get JPEG from RAW
		- demosaic → sharpen → white balance → gamma/curve → compress
	- optical role: isolate the light rays (from one particular part of the scene)
	- we can model a complicated system of lenses using just one lens
	- perfect lens: hypothesis
		- a point in the scene corresponds to a point in an image
		- this is not true, there are artifacts
	- chromatic artifacts (fringing)
		- diffraction – wavelength dependent
	- vignetting … border of the image is darker
	- geometric distortion (for wide-angle cameras)
	- CCD sensor, CMOS sensor
	- rolling shutter
	- color spaces
		- RGB … additive
		- CMY … subtractive
	- color perception
		- retina
		- fovea
		- rods – achromatic perception of lights, pigmentation (rhodopsin) is sensitive to all visible spectrum (peak on green)
		- cones – color perception
		- mantis shrimp has the most complex visual system ever discovered
	- color perception in a camera
		- deviation/dispersion prism
			- 3 CCD sensors
			- precise alignment, high quality filter
			- expensive
		- Bayer filter
			- individual (plastic) filter for each pixel (RGGB, RGCB)
			- to get colors in each pixel, we interpolate (integrate over spectrums)
	- sensor artifacts
		- noise: salt and pepper, thermic noise (as the camera heats up)
		- aliasing
	- gamma correction
	- JPEG compression artifacts