The Ancient Art of the Numerati
In the previous chapters I used people’s ratings of products to make recommendations. In this chapter I use attributes of the products themselves to make recommendations. This approach is used by Pandora among others.
The very short example filteringdata4b.py.
By submitting a comment here you grant Ron Zacharski a perpetual license to reproduce your words and name/web site in attribution. Inappropriate or irrelevant comments will be removed at an admin's discretion.
4 Comments to Chapter 4
by Zwe Maung
On September 22, 2010 at 3:45 pm
Chap. 4
Page 1, Paragraph 1, Line 9 – You capitalized “The” to describe the Strokes, but you did not capitalize it anywhere else. You should keep a consistent format.
Page 16, Paragraph 2, Line 1 – You should capitalize “manhattan” because it is a proper noun.
Page 17, Paragraph 4, Line 1 – Misspelling “classifer” should be “classifier”
Page 19 Line 2 – You wrote down “Yuyyan” when the picture says “Yuyuan”. Google says “Yuyuan” is the correct spelling.
by Amy Sams
On September 29, 2010 at 10:37 pm
I was thinking that you may want to add a header to all the pages of your book that has the title (A Programmer’s Guide to Data Mining) and what chapter it is (it would be above a horizontal line or some kind of division). That way if someone printed out just a chapter of your book and showed it to someone, then that person would know what book it was from. Plus, it would kinda give all your pages the same structure for a more unified look. Also, you may want to change the font to something “more fun”….I really like the ‘Note’ sections and the font you use…it always seems to catch my eye.
——–
Pg. 1: Add a comma to the 2nd sentence —> “In social filtering, …”
Italicize Wolfgang Amadeus Phonenix (album title)
Italicize Contra (album title)
Add a comma to the 6th sentence –> “In this chapter, …”
Combine the 12th & 13th sentences (with other changes) –> “Pandora dooesn’t do this with social filtering, instead it uses an algorithm that believes the Strokes are musically similar to Phoenix.”
Add a question mark after What Ever Happened
Change “you” to “you’ve” — in your quotation from Pandora
Instead of “that” in the 2nd sentence of 3rd para., use “to” –>”..as analysts to determine”
Instead of “Once trained”, say “Once they have completed their training, they spend….”
Pg. 2: Use a colon instead of “–” –> “Many of these genes are technical: ”
2nd sentence after the list, change “Its” to “It’s”
The picture of the paper bag should be on the same page as the sentence “In 2D space….”
Pg. 3: Italicize You’re Beautiful
Pg. 4: Before the last sentence, you might want to add something like “Does that make sense to you?”
Pg. 5: Just to be more uniformal, you might want to fill in the rest of the table (such as for Blues Influence, 1 indicates no blues influenc, 5 indicates a strong blues influence). Also, what does the driving beat scale mean?
Pg. 6: Should the code be colored like the other users dictionaries are in other chapters?
Pg. 7: Can you explain why it is a pretty good recommendation? Is there a specified distance that is too far for a neighbor? Like on the next page, you say that the Lady Gaga recommendation is particularly bad….why is that?
Pg. 8: Add an apostrophe (4th sentence)–> Black Keys’
Italicize Just Got to Be
Pg. 9: Add a comma to the last sentence of the 3rd para. –> “Depending on the dataset, this …”
Add a colon to the last sentence of the 4th para. –> “…compute the standard error: ”
Pg. 10: Move the last sentence to the next page and add a colon after it
Pg. 11: Under “Modified Standard Score” heading, you mention the “above formula” but I belive you are talking about the formula on the previous page
Pg. 13: Change 2nd sentence under “To normalize or not” heading –> “I’ve previously noted several examples of this.” (b/c it isn’t really above but earlier in the chapter)
Can you explain the computational cost involved with normalizing more?
Pg. 17: Make the footnote a smaller size font so it doesn’t distract from your writing
Pg. 18: The picture of Jayne Appel is kinda blurry. Could you use a different one? I found a couple links (http://andersonswbphotos.blogspot.com/2008/10/jayne-appel-recent-photos.html or http://nba.msg.com/photo/0doiaibaaP5q8)
Pg. 20: 2nd sentence under “With My Own 2 Hands” heading, add a comma –>”…time consuming, but the results…”
by Gary
On October 10, 2010 at 4:38 pm
Minor typo, P4-19 in the “Your task” box, #2 last line, “explanation is[as] to why…
by Patrick
On October 11, 2010 at 6:33 pm
Pg 1. The sentences “We know that many of our customers who bought that album also bought Contra by
Vampire Weekend. So we recommend that album to you.” should be joined into one sentence with a comma.
Pg 2. “how much twangy guitars does it have?” -> should change either “much” to “many” or “guitars” to “guitar”
Pg 5. “They all can be on a 1-5 scale—how ‘country’ is the sound of this track—one means no hint of
country to ’5′ means this is a solid country sound.” I would consider changing to improve readability to something more like this:
“They all can be on a 1-5 scale—how ‘country’ is the sound of this track—’1′ means ‘no hint of
country’ to ’5′ means ‘this is a solid country sound.’”