data mining from databases
foundations
data mining is an interactive and iterative process
PoV of a manager
tasks
real tasks (examples)
goal, popular methodologies
SEMMA
CRISP-DM
ASUM
types of attributes
data standardization of interval-scaled attributes
range, quartiles, outliers
contingency table
-test
Fisher’s test
motivation
linear regression
correlation analysis
multi-dimensional regression
discriminant analysis
cluster analysis, assumptions
centroid
k-means clustering
k-medians
hierarchical clustering
learning vector quantization (LVQ)
k-medoids
grid-based methods
density-based algorithms
scalable approaches – for lots of data
decision tree, (dis)advantages
top down induction of decision trees (TDIDT)
ID3 algorithm
C4.5, C5.0
classification and regression trees (algorithm CART)
algorithm CHAID
bagging
random forests
boosting
random forests vs. boosting