Chapter 8
Chapters
1: Introduction
2: Recommendation systems
3: Item-based filtering
4: Classification
5: More on classification
6: Naïve Bayes
7: Unstructured text
8: Clustering
Clustering
This chapter looks at two different methods of clustering: hierarchical clustering and kmeans clustering.
Contents
- what is clustering
- hierarchical clustering
- single-linkage, complete-linkage, average-linkage
- clustering dog breeds
- clustering breakfast cereals
- kmeans clustering
- kmeans++
- Clustering Enron Email
The PDF of the Chapter
Python code
- hierarchicalClustererTemplate.py (p 20)
- hierarchicalClusterer.py (p 21)
- kmeans.py (p 40)
- kmeansPlusPlus.py (p 54)
Data
- dog.csv (dog breed example)
- dogDistanceSorted.txt
- cereal.csv (breakfast cereals)
- mpg.txt (car mpg data)
- enrondata.txt (Enron from-to counts data)
- mongodb dump of entire Enron data (> 300mb)