Chapter 7
Chapters
1: Introduction
2: Recommendation systems
3: Item-based filtering
4: Classification
5: More on classification
6: Naïve Bayes
7: Unstructured text
8: Clustering
Naïve Bayes and unstructured text
This chapter explores how we can use Naïve Bayes to classify unstructured text. Can we do sentiment analysis of movie reviews to determine if the reviews are positive or negative?
Contents
- an automatic system for determining positive and negative texts
- how to train a Naïve Bayes classifier using unstructured text
- stop words — discarding common words
- classifying newsgroups
- Python code for Naïve Bayes
- Sentiment Analaysis
The PDF of the Chapter
Python code
- bayesText-ClassifyTemplate.py (page 23)
- bayesText.py (page 24)
- bayesSentiment.py (page 32)
Data
- zip file containing the 20 newsgroup corpus (22MB)
- The 20 Newsgroups data set website
- Review Polarity Dataset (divided into 10 buckets)
- The Movie Review Polarity Dataset Webpage