ESL_chap2_notes
2.2 Variable Types and Terminology
outputs:
* **quantitative**: 数值上的接近表示自然意义的接近 -> regression
- qualitative: cases的有限集合; categorical or discrete -> classification
- ordered categorical: 如small,medium和large
qualitative的表示:* {0,1} for two categories
- sometimes dummy variables for K-level qualitative variables, like [1, 0,…,0], [0,1,…,0]
2.3 The Linear Model fit by Least Squares and Nearest Neighbors
- Linear Model: low variance and potentially high bias;合适scenarios 1
- Nearest Neighbors: high variance and low bias;合适scenarios 2
The effective number of KNN is N/k and is generally bigger than p (p parameters for least-square)
Two possible scenarios of training data:
- 每个class的数据都是从bivariate Gaussian分布产生,分布之间不相关且means不同; linear decision boundary is the best we can do;
- 每一个class的数据都是10个混合的低方差Gaussian分布,且这10个的mean都来自同一Gaussian分布