2.2 Variable Types and Terminology

  1. outputs:

    *   **quantitative**: 数值上的接近表示自然意义的接近 -> regression
    
    • qualitative: cases的有限集合; categorical or discrete -> classification
    • ordered categorical: 如small,medium和large
      qualitative的表示:
      *   {0,1} for two categories
      
      • sometimes dummy variables for K-level qualitative variables, like [1, 0,…,0], [0,1,…,0]

2.3 The Linear Model fit by Least Squares and Nearest Neighbors

  • Linear Model: low variance and potentially high bias;合适scenarios 1
  • Nearest Neighbors: high variance and low bias;合适scenarios 2
    The effective number of KNN is N/k and is generally bigger than p (p parameters for least-square)

Two possible scenarios of training data:

  1. 每个class的数据都是从bivariate Gaussian分布产生,分布之间不相关且means不同; linear decision boundary is the best we can do;
  2. 每一个class的数据都是10个混合的低方差Gaussian分布,且这10个的mean都来自同一Gaussian分布