The Ancient Art of the Numerati
This chapter explores how we can use Naïve Bayes to classify unstructuted text. Can we classify twitter posts about a movie as to whether the post was a positive review or a negative one?
By submitting a comment here you grant Ron Zacharski a perpetual license to reproduce your words and name/web site in attribution. Inappropriate or irrelevant comments will be removed at an admin's discretion.
2 Comments to Chapter 6
by Matt Martin
On November 18, 2010 at 12:41 am
In the part about JSON you say that it stands for “JavaScript Object Notion,” but I think you meant “JavaScript Object Notation.”
Throughout the Twitter section, you use the word “twitter” as referring to a status update, but I think the more accepted term for that is “tweet.” Also, when you refer to Twitter as a web site you might want to capitalize it (it looks like you used both “Twitter” and “twitter” interchangeably).
Finally, the link to this chapter from the table of contents page (not the home page) seems to be broken.
by Kristine
On December 9, 2011 at 1:18 am
Thanks for all the great information, I found it very interesting.
Quick note, the python code contains two truncation errors:
1. in the test function, the division of (correct / total) is between two integers and results in truncation to zero. This can be fixed by changing the declarations of correct and total in the same function to be to 0.0 instead of just 0.
2. Similarly, when computing the probabilities in the class initialization, the arithmetic is all integers and everything gets truncated to 0. Changing the calculation from (count + 1) / denominator to (count + 1.0) / denominator solves this problem.
Thanks again!