Big Data and What Reviews Mean

One of the interesting effects of bookkeeping for small business has been the gradual, and probably unintentional, replacement of reviews and critiques by experts by aggregated data from people who may or may not have any particular expertise in whatever field they’re reviewing.  This is problematic if we treat these these reviews if they’re telling us the same sort of thing that the experts are saying.  Not only are they not the same thing, but the goals are entirely different.

People who make their living by critiquing anything – quotes from Teddy Roosevelt notwithstanding – provide a valuable service.  I’ve run across reviews of books and movies that have explained why I liked something (or didn’t like it) – reasons that resonated with me, but that I was unable to explain as well as the reviewer did.  A part of good criticism is explaining why something makes for good or bad books, movies, food, whatever.  Another part assumes that it is the job of a reviewer to point people to the best art in a given field, and explain why it’s the best art.  Those are, as far as I can tell, the basic responsibilities that criticism attempts to fulfill.  A key assumption, here, is that whatever may be popular may not be what is the best art, and criticism provides a different yardstick than, say, looking at the number of album sales.  Good criticism could inform us that Jascha Heifetz is a better artist than Justin Bieber; album sales cannot give us this information.

This has become especially obvious to me as I’ve been using goodreads.com to keep track of the books that I read.  The idea behind goodreads, as far as I’ve been able to tell, is to help you find your next favorite book, presumably they’ve been able to figure out a way for this to generate revenue.  Although I’m sure that there are some fairly complicated algorithms making all this work, the basic idea behind this can be fundamentally simple:  let’s look at what you’ve enjoyed, and then look for people who have also enjoyed that book, and then let’s recommend whatever they also found enjoyable.

An important part of this is the ratings that each user assigns to each book – what’s important isn’t just what you’ve read, but what you’ve read and rated highly. I realized this after logging the past 40 or 50 books that I’ve read while on goodreads, and sorted them out by the rating I gave them.  If the books that I was reading were chosen randomly, I would expect the distribution to look something like a bell curve:  some terrible books – outliers like Twilight, for example, receiving a 1-star rating, the majority of books – such as The Hunt for Red October receiving three stars, and a smaller number of truly great books – such as The Brothers Karamazov – receiving 5 stars.

This is not what I found.  The idea behind goodreads, of course, is that it wants to recommend books to me that it has statistical reasons to believe that I will rate highly, i.e,. the ideal book for it to recommend is the one to which I will give the highest rating – not the one that’s the best, or will be difficult but will cause me to grow as a person.  This means that the rating system there is biased enough to be useless – goodreads is only going to recommend Twilight to people who are probably going to like it, and it will – as a result – get good reviews.  It’s perhaps more helpful to think of a book’s rating on goodreads not as how good the book is, but as how enjoyable the book was for people who goodreads thought would like it.  It’s not a rating of the book.  It’s a rating of their algorithm.

Once you keep this in mind, some of the ratings on goodreads start to make more sense – it’s kind of a fun game to see what books of classic literature are rated lower than, say, Breaking Dawn.  The list is depressingly long.  1

Some of this is inevitable:  if you’re not paying money for a service, it’s because you’re the product, not the customer.  Goodreads has a vested interest in making their service valuable, and the easiest way to do that is by making you read books that you enjoy.  Steering someone that just read Mein Kampf to Between the World and Me may be admirable, but it’s not likely to generate repeat business, and thus doesn’t help goodreads turn a profit,.  We all know what media companies do when they have to pick only one of those options.

The problem, of course, is not just goodreads:  anything service that recommends any product by amalgamating data from non-experts has this same issue.  And while listening to this sort of data may help us find what we’re likely to enjoy, this isn’t the same thing as actually refining our tastes and working to improve the quality of art (or books, or food, or music) that we love.  Alas, the pursuit of virtue may not be best for the stockholder.  Giving people what they want, on the other hand, always turns a profit.

 

Notes:

  1. In the first three minutes of searching, I found that The Red Badge of Courage, The Scarlet Letter, and All Quiet on the Western Front all fit into this category.  Not the best books in the western literary canon, to be sure, but I think there’s a compelling case that they’re all better than anything from the Twilight series.

Leave a Reply

Your email address will not be published. Required fields are marked *