Chapter 3

Implicit Ratings and Item Based Filtering

This chapter starts with a discussion of the types of user ratings we can use. Users can explicitly give ratings (thumbs up, thumbs down, 5 stars, or whatever) or they can rate products implicitly–if they buy an mp3 from Amazon, we can view that purchase as a ‘like’ rating.

Contents

  • Explicit Ratings
  • Implicit Ratings
  • Which is more accurate: Explicit or implicit?
  • User-based filtering
  • Item-based filtering
  • Adjusted Cosine Similarity
  • Slope One Algorithm
  • Python code for Slope One
  • MovieLens data

The PDF of chapter 3

Python Code

There is only one Python file for this chapter: recommender3.py

Data

In addition to the data set introduced in chapter 2, this chapter uses the MovieLens dataset available from www.grouplens.org The dataset used in this chapter is the smallest one on that site–the 100,000 rating one.

13 Comments to Chapter 3

  1. by Christopher Randles

    On September 9, 2010 at 2:11 am

    Spelling error: pg 1
    “Here the user eplicitly (intentionally) rates…”
    explicitly*

  2. by Gary

    On September 11, 2010 at 11:40 pm

    Just some feedback on Ch 3. The intro seemed choppy, my totally unprofessional rendition is below.

    In what I like to call the awesome chapter 2 we learned the basics of collaborative filtering and recommendation systems. The algorithms are general purpose and could be used with a variety of data. Users rated different items on a 5 or 10 point scale and found other users who had similar ratings. As was mentioned, there is some evidence to suggest users typically don’t use this fine-grain distinction for many applications and instead tend to give either the top rating or the lowest rating. This all or nothing rating strategy combined with general purpose algorithms can lead to (absurd, unusable, things that make you go hmmmm, incorrect, stoopid <–pick one) recommendations. In this chapter we will examine ways to fine tune collaborative filtering to produce a more accurate recommendation in an efficient manner.

    Also,
    Chapter 3 pages are numbered 2-#, should be 3-# I'm guessing?

    In addition,
    Regarding the use of the boxes to highlite stuff. I like the boxes and font change, it does draw my eye but,
    In Ch 2 you have a Q&A box on 2-6 and 2-7 with only the ANSWER: labeled as such.
    On page 2-11 you have a calisthenic excercise, on page 2-18 you have a puzzler, 2-24 a project.
    CH 3 – 2-4 a bold typed question, 2-13 and 2-14 a Q&A but only the Q is boxed in (I think). 2-24 a project
    Ch 4 – 4-19 has a "Your Task"
    I would suggest standardizing the terms and usage of the boxes throughout the book. Maybe a Q@A is something to read and get you to think, a "puzzler" is a short task to get you to think and do, a "project" could be a homework type question reinforcing the days learning, a "Calisthenic" is a larger "project", and a "Your Task" is the ultimate evil, ruin your weekend type of project!
    :o
    Just my 2 cents, which in this economy, might not be worth diddly.

  3. by Jessica White

    On September 12, 2010 at 11:55 pm

    Here are some corrections I found for chapter 3:

    Page 1:
    “Here the user eplicitly (intentionally) rates an item…” -explicitly is missing an ‘x’

    “Another example would be the thumb up/thumbs down…” -thumbs up is missing the ‘s’

    Page 5:
    After “Fool Moon: The Dresden Files Book 2)” there are 2 “and”s.

    Page 7:
    “Using purchase history as an implicit rating of what a
    person likes, might lead you to believe that people who like kettlebells, like stuffed animals, like
    microHelicopters, books on anticancer, and the Dresden File books.” -The comma before “might lead you” is unnecessary. Seems like the comma before “like stuffed animals” as well as the “like” before “microHelicopters” are unnecessary as well.

    In Baker’s quote: “last time we say her” should probably be “last time we saw her”.

  4. by Erin Wuepper

    On September 13, 2010 at 7:17 pm

    Here are some grammar, wording and spelling mistakes I found. I have given the page number first in (), then the correction, followed by the sentence where the mistake was found. The mistake is surrounded by [].

    (Pg 1) Spelling “explicitly”: “Here the user [eplicitly] (intentionally) rates an item using number of stars.”

    (Pg 2) Take out the “Or”: “[Or] consider what information we can gain from recording what products a user clicks on in Amazon.”

    (Pg 4) Spelling “Bereilles’s”: “In the screen shot above, I’ve listened to Sara [Bereilles'] Between the Lines 16 times.”

    (Pg 4) Need quotes around “Between the Lines”: “In the screen shot above, I’ve listened to Sara Bereilles’ [Between the Lines] 16 times.”

    (Pg 5) Get rid of “have”: “I [have] imagine most of you have bought a substantial amount of stuff on Amazon.”

    (Pg 5) Duplicate “and”: “(Murder City: Ciudad Juarez and the Global Economy’s New Killing Fields and Fool Moon: The Dresden Files Book 2) and [and] the physical books No Place to Hide, Dr. Weil’s 8 Weeks to Optimum Health, Anticancer: A new way of life, and Rework.”

    (Pg 5) Underline all book titles: “([Murder City: Ciudad Juarez and the Global Economy's New Killing Fields] and [Fool Moon: The Dresden Files Book 2]) and and the physical books [No Place to Hide], [Dr. Weil's 8 Weeks to Optimum Health], [Anticancer: A new way of life, and Rework].”

    (Pg 7) Underline “Enter the Kettlebell! Secret of the Soviet Supermen” instead of italicize: “I bought some kettlebells and the book [Enter the Kettlebell! Secret of the Soviet Supermen] as a gift for my son and a Plush Chase Border Collie stuffed animal for my wife because our 14 year old border collie died.”

    (Pg 7) Put hyphens in “14-year-old”: I bought some kettlebells and the book Enter the Kettlebell! Secret of theSoviet Supermen as a gift for my son and a Plush Chase Border Collie stuffed animal for my wife because our [14 year old border] collie died.”

    (Pg 7) “Say” is supposed to be “saw”: “Last time we [say] her, this girl was wearing black clothing with a lot of writing on it, most of it angry.”

    (Pg 7) Put a comma after “Finally”: “Finally[] consider a couple sharing a Netflix account.”

    (Pg 7) Instead of a comma, put a period after the first full sentence: “He likes action flicks with lots of explosions and helicopters[,] she likes intellectual movies and romantic comedies.”

    (Pg 7) Underline book titles instead of italicize: “Recall that I said my purchase of the book [Anticancer: A New Way of Life] was as a gift to my cousin.”

    (Pg 7) Add comma after “In fact” and remove comma after “year”: “In fact[] in the last
    year[,] I purchased multiple copies of three books.”

    (Pg 8) “Every time” is two words: “[Everytime] you want to make a recommendation for someone you need to calculate one million distances (comparing that person to the one million other people).”

    (Pg 8) Spelling “Sparsity”: “2. [Sparcity]. Most recommendation systems have many users and many products but the average user rates a small fraction of the total products.”

    (Pg 9) Spelling “Wolfgang” :“If a user rates [Wolgang] Amadeus Phoenix highly we could recommend the similar album Manners.”

    (Pg 9) Add comma after “As before”: “As before[] the rows represent the users and the columns represent bands.”

    (Pg 15) Change period to question mark: “How can we use that collection to make predictions[.]”

    (Pg 15) Reword question to “How might Ben rate Phoenix?”: “How Ben will like Phoenix?”

    (Pg 15) Spelling “dissect”: “Let’s [disect] the numerator.”

    (Pg 15) Spelling “dissecting”: “[Disecting] the demoninator we get something like for every band that Ben has rated sum the cardinalities of those bands to Phoenix.”

    (Pg 15) Spelling “denominator”: “Disecting the [demoninator] we get something like for every band that Ben has rated sum the cardinalities of those bands to Phoenix.”

    (Pg 15) Spelling “denominator”: “Lady Gaga and her cardinality is also 2. So the [demoninator] is 3.”

    (Pg 16) Spelling “computing”: “Again, the formula for [compuing] deviations is”

    (Pg 16) Spelling “pseudocode”: “That [psuedocode] looks pretty nice but as you can see, there is a disconnect between the data format expected by the psuedo code and the format the data is really in (see users2 above as an example).”

    (Pg 16) Spelling “pseudocode”: “That psuedocode looks pretty nice but as you can see, there is a disconnect between the data format expected by the [psuedo code] and the format the data is really in (see users2 above as an example).”

    (Pg 17) Spelling “pseudocode”: “code warriors we have two possibilities, either alter the format of the data, or revise the [psuedocode].”

    (Pg 17) Spelling “pseudocode” (again): “This revised [psuedocode] looks like”
    Spelling “pseudocode” (again,again): “First, let’s parse that formula and put it into English and/or [psuedo code].”

    (Pg 20) Spelling “pseudo-English”: “The formula in [psuedo English]:”

  5. by raz

    On September 17, 2010 at 7:18 pm

    Thanks so much. Man, I am pretty consistent as spelling pseudo as psuedo.

  6. by raz

    On September 17, 2010 at 11:01 pm

    Gary, thanks for the suggestions. I am reworking the intro to the chapter. You have a good point about the boxes. I’ll rework those as well.

  7. by Amy Sams

    On September 22, 2010 at 9:48 pm

    Pg. 1: Under ‘Explicit ratings’ heading, instead of herself…I would use “his or herself”, it just makes it more gender neutral.

    Pg. 2: Second sentence, don’t need “of this”. Should be “An example is keeping…”

    Pg. 3: Just a general question, does skipping over a song count as a negative rating? I thought that you have the option to skip and if you really don’t like it then you can choose to have it not play for a certain amount of time. I have skipped songs in the past (like when I’m tired of hearing it) but they’ll still play later on. When I give a song a thumbs down, however, it hasn’t ever been played again on that Pandora station.

    Pg. 3: For structuring, you could put the last line on page 3 on the next page so it doesn’t seem like it is just randomly there.

    Pg. 5: You’re repeating the same heading of “Which is more accurate…”. Also, you might want to make “What are the problems with explicit ratings?” a heading, such as “Problems with Explicit Ratings” (like you did on pg. 6).

    Pg. 6: 4th sentence under “Problem 3:…”, should be: “It is easy to fly, great fun, and has survived multiple crashes.”

    Pg. 7: 5th sentence beginnning, “Using purchase history as an implicit rating of what a person likes, ….”. This sentence is a bit confusing. Are you trying to say that someone who likes kettlebells will like all this other stuff or someone who likes kettlebells and stuffed animals will like …? Instead of using commas, I would use “and” to show what goes together.

    Pg. 11: 2nd sentence, remove “hence”

    Pg. 12: 2nd to last sentence, add a comma: “Using all the bands he did rate along with our database of deviations, we…”

    Pg. 12: Last sentence needs a period at the end. Also, what is the picture of? Does it represent the “broad brush” concept?

    Pg. 13: So when you use “with respect to”, this implies that the rating of the work that follows that statement comes first in the numerator when you do the subtraction?

    Pg. 13: Add a “-” to the last sentence of the paragraph between the u’s when you are describing the numerator before the deviation calculation.

  8. by Cardigan

    On September 23, 2010 at 3:06 am

    The brackets indicate what needs to be added.
    Pg 1. “User rated items [were] on a 5 or 10 point scale.”
    Pg 2. “If the user clicks on the article Fastest Way to Lose Weight Discovered by Professional Trainers and the article Slow and Steady: How to lose weight and keep it off[,] perhaps he wishes to lose weight.”
    Pg. 3 “It knows, for example, that users who viewed the Wolfgang Amadeus Phoenix product page[remove comma] also viewed the XX product page.”
    Pg. 3 “You would think that there could be the potential for some weird recommendations[,] but this works surprisingly well.”
    Pg. 10 “To compensate for this grade inflation[,] we will subtract the user’s average rating from each rating.”

  9. by Matt Martin

    On September 26, 2010 at 7:54 pm

    “Part 1: Computing deviation” begins on page 13, but you don’t seem to indicate what part 2 is or where it begins, which is a little confusing.

  10. by Matt Martin

    On September 26, 2010 at 9:26 pm

    Also, on page 24 “Does the recommend recommend movies you might like?” should probably say “Does the recommender recommend movies you might like?”

  11. by Amy Sams

    On September 28, 2010 at 2:35 am

    Page 17: Under Step 2, you need to indent the last line of code (otherwise it isn’t in the for loop).

  12. by Amy Sams

    On September 28, 2010 at 2:44 am

    Page 18 (above Step 4): Need a quotation mark —> needs to be ["Dr. Dog"] ["Lady Gaga"]

  13. by Patrick

    On October 4, 2010 at 9:28 pm

    Pg 6. You may have misquoted Baker. “…16-year-old niece. The last time we say her, this girl was wearing…” the word “say” is probably “saw”.
    Pg 15. Step #11: the wording for the first sentence could sound a little better, it comes across confusing at first. Perhaps you could say
    “To calculate the denominator, iterate through the bands Ben has rated, summing the carnalities (the number of people who rated both) between those bands and Phoenix. So Ben has rated Dr. Dog, and the cardinality between Dr. Dog and Phoenix is 1. Ben has rated Lady Gaga, her cardinality to Phoenix is 2. So the demoninator is 1 + 2 = 3.”

Leave a Comment

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

By submitting a comment here you grant Ron Zacharski a perpetual license to reproduce your words and name/web site in attribution. Inappropriate or irrelevant comments will be removed at an admin's discretion.