Chapter 4

Classification

In the previous chapters I used people’s ratings of products to make recommendations. In this chapter I use attributes of the products themselves to make recommendations. This approach is used by Pandora among others.

Contents

  • Introduction to Pandora-like systems
  • The importance of selecting appropriate attributes and values
  • An example: music attributes and a nearest neighbor approach
  • Data normalization
  • Modified Standard Score
  • Python code: music, attributes, and a simple nearest neighbor approach
  • A sports example.
  • Ways of acquiring attribute data

The PDF of chapter 4

Python Code

The very short example filteringdata4b.py.

4 Comments to Chapter 4

  1. by Zwe Maung

    On September 22, 2010 at 3:45 pm

    Chap. 4
    Page 1, Paragraph 1, Line 9 – You capitalized “The” to describe the Strokes, but you did not capitalize it anywhere else. You should keep a consistent format.

    Page 16, Paragraph 2, Line 1 – You should capitalize “manhattan” because it is a proper noun.

    Page 17, Paragraph 4, Line 1 – Misspelling “classifer” should be “classifier”

    Page 19 Line 2 – You wrote down “Yuyyan” when the picture says “Yuyuan”. Google says “Yuyuan” is the correct spelling.

  2. by Amy Sams

    On September 29, 2010 at 10:37 pm

    I was thinking that you may want to add a header to all the pages of your book that has the title (A Programmer’s Guide to Data Mining) and what chapter it is (it would be above a horizontal line or some kind of division). That way if someone printed out just a chapter of your book and showed it to someone, then that person would know what book it was from. Plus, it would kinda give all your pages the same structure for a more unified look. Also, you may want to change the font to something “more fun”….I really like the ‘Note’ sections and the font you use…it always seems to catch my eye.

    ——–

    Pg. 1: Add a comma to the 2nd sentence —> “In social filtering, …”
    Italicize Wolfgang Amadeus Phonenix (album title)
    Italicize Contra (album title)
    Add a comma to the 6th sentence –> “In this chapter, …”
    Combine the 12th & 13th sentences (with other changes) –> “Pandora dooesn’t do this with social filtering, instead it uses an algorithm that believes the Strokes are musically similar to Phoenix.”

    Add a question mark after What Ever Happened
    Change “you” to “you’ve” — in your quotation from Pandora
    Instead of “that” in the 2nd sentence of 3rd para., use “to” –>”..as analysts to determine”
    Instead of “Once trained”, say “Once they have completed their training, they spend….”

    Pg. 2: Use a colon instead of “–” –> “Many of these genes are technical: ”
    2nd sentence after the list, change “Its” to “It’s”
    The picture of the paper bag should be on the same page as the sentence “In 2D space….”

    Pg. 3: Italicize You’re Beautiful

    Pg. 4: Before the last sentence, you might want to add something like “Does that make sense to you?”

    Pg. 5: Just to be more uniformal, you might want to fill in the rest of the table (such as for Blues Influence, 1 indicates no blues influenc, 5 indicates a strong blues influence). Also, what does the driving beat scale mean?

    Pg. 6: Should the code be colored like the other users dictionaries are in other chapters?

    Pg. 7: Can you explain why it is a pretty good recommendation? Is there a specified distance that is too far for a neighbor? Like on the next page, you say that the Lady Gaga recommendation is particularly bad….why is that?

    Pg. 8: Add an apostrophe (4th sentence)–> Black Keys’
    Italicize Just Got to Be

    Pg. 9: Add a comma to the last sentence of the 3rd para. –> “Depending on the dataset, this …”
    Add a colon to the last sentence of the 4th para. –> “…compute the standard error: ”

    Pg. 10: Move the last sentence to the next page and add a colon after it

    Pg. 11: Under “Modified Standard Score” heading, you mention the “above formula” but I belive you are talking about the formula on the previous page

    Pg. 13: Change 2nd sentence under “To normalize or not” heading –> “I’ve previously noted several examples of this.” (b/c it isn’t really above but earlier in the chapter)

    Can you explain the computational cost involved with normalizing more?

    Pg. 17: Make the footnote a smaller size font so it doesn’t distract from your writing

    Pg. 18: The picture of Jayne Appel is kinda blurry. Could you use a different one? I found a couple links (http://andersonswbphotos.blogspot.com/2008/10/jayne-appel-recent-photos.html or http://nba.msg.com/photo/0doiaibaaP5q8)

    Pg. 20: 2nd sentence under “With My Own 2 Hands” heading, add a comma –>”…time consuming, but the results…”

  3. by Gary

    On October 10, 2010 at 4:38 pm

    Minor typo, P4-19 in the “Your task” box, #2 last line, “explanation is[as] to why…

  4. by Patrick

    On October 11, 2010 at 6:33 pm

    Pg 1. The sentences “We know that many of our customers who bought that album also bought Contra by
    Vampire Weekend. So we recommend that album to you.” should be joined into one sentence with a comma.
    Pg 2. “how much twangy guitars does it have?” -> should change either “much” to “many” or “guitars” to “guitar”
    Pg 5. “They all can be on a 1-5 scale—how ‘country’ is the sound of this track—one means no hint of
    country to ’5′ means this is a solid country sound.” I would consider changing to improve readability to something more like this:
    “They all can be on a 1-5 scale—how ‘country’ is the sound of this track—’1′ means ‘no hint of
    country’ to ’5′ means ‘this is a solid country sound.’”

Leave a Comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

By submitting a comment here you grant Ron Zacharski a perpetual license to reproduce your words and name/web site in attribution. Inappropriate or irrelevant comments will be removed at an admin's discretion.