1: Introduction
2: Recommendation systems
3: Item-based filtering
4: Classification
5: More on classification
6: Naïve Bayes
7: Unstructured text
8: Clustering

Further Explorations in Classification

This chapter examines several other algorithms for classification including kNN and naïve Bayes. We look at the power of adding more data.


  • Evaluating classifiers: training sets and test data
  • 10-fold cross validation
  • Which is better: adding more data or improving the algorithm?
  • the kNN algorithm
  • Python implementation of kNN

The PDF of the Chapter

Python code

Page 13: divide data into buckets:

Page 14: from last chapter (please modify to implement 10-fold cross validation).

Page 15: one solution to implementing 10-fold cross validation:

Page 36: one solution to implementing kNN:


Page 13. Auto MPG Data Set. (Quinlin 1993)
Page 34. Pima Indians Diabetes Data Set (National Institute of Diabetes and Digestive and Kidney Diseases)
  • (containing 100 instances divided into 10 buckets)
  • (full data set divided into 10 buckets)