Getting started with Apache Mahout

This article gives you a first idea about what Apache Mahout library can do and how can it be used in real life projects. You can see it as a “Hello World” project for Mahout. After a short introduction to Apache Mahout, we will see what a recommender is, then we will create a simple recommender using the library. As this is a Java oriented article, you will require basic Java programming skills.

Apache Mahout short description

Mahout is a Java written open source scalable machine learning library from Apache. The machine learning algorithms implemented by Mahout are focused on: clustering, classification and recommendations. For scalability, the  algorithms were based on Apache Hadoop and the map/reduce paradigm. Starting April 2014 the project decided to move to Apache Spark. Mahout can be successfully used for machine learning problems which involve very large collections of data. For small amounts of data, other libraries or products can be faster and therefore better suited. More details about the project: https://mahout.apache.org

Recommender explained

A recommender is an application which can suggest products/services to a user, based on the preferences of expressed by other users with similar preferences.

Let’s see an example. We have three users which have expressed their preference about four books. The preferences are given using rates from one to five, five being the maximum rate. Our goal is to recommend a new book to User 1. As we can see in the following table, User 1 and User 2 have similar preferences, since they both gave Book 1 a rate of five. User 3 however, gave a rating of one for Book 1. This means User 1 and User 3 don’t have similar preferences. Looking at the table we expect the recommender to output Book 3 for User 1, as he doesn’t know this book yet and Book 3 is appreciated by User 2.

UserItemItem IDPreference
User 1Book 115.0
User 1Book 223.0
User 2Book 115.0
User 2Book 221.0
User 2Book 334.0
User 2Book 441.0
User 3Book 111.0
User 3Book 225.0
User 3Book 332.0
User 3Book 444.0

Creating the recommender using Apache Mahout

We will start by creating a Maven project, we will then add the Mahout libraries to this project and finally we will write a basic recommender. Prerequisites:

Create a Maven project

From command line create a Maven project named “recommender”:

mvn archetype:generate \
    -DarchetypeGroupId=org.apache.maven.archetypes \
    -DgroupId=com.technobium \
    -DartifactId=recommender \
    -DinteractiveMode=false

This will create a “recommender” project with a default class named “App”.  Rename the default class to “BasicRecommender”

mv recommender/src/main/java/com/technobium/App.java \
   recommender/src/main/java/com/technobium/BasicRecommender.java

Navigate to the “recommender” project and edit the pom.xml file:

cd recommender
nano pom.xml

Add the dependencies for Apache Mahout 0.9 in the <dependencies> section. I also added the logging facade library slf4j, because it is needed by Mahout.

<dependencies>
	...
	<dependency>
		<groupId>org.apache.mahout</groupId>
		<artifactId>mahout-core</artifactId>
		<version>0.9</version>
	</dependency>
	<dependency>
		<groupId>org.slf4j</groupId>
		<artifactId>slf4j-simple</artifactId>
		<version>1.7.7</version>
	</dependency>
</dependencies>

Optionally you can generate the Eclipse project:

mvn eclipse:eclipse

Create the recommender

The input for the recommender is a comma separated values list of user preferences in the form:

  • userID – user identification
  • itemID – item identification
  • value – the affinity or preference of the current user to the item

Create an input file with this preference data:

mkdir input
nano input/data.csv

Add the following sample content, which reflects the preferences table explained before:

1,1,5.0
1,2,2.0
2,1,5.0
2,2,1.0
2,3,4.0
2,4,1.0
3,1,1.0
3,2,5.0
3,3,2.0
3,4,4.0

Edit the default created class and the following content:

package com.technobium;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.ThresholdUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.UserBasedRecommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Hello Mahout world!
 * 
 */
public class BasicRecommender {

    public static void main(String[] args) throws IOException, TasteException {

        Logger log = LoggerFactory.getLogger(BasicRecommender.class);

        // Load historical data about user preferences
        DataModel model = new FileDataModel(new File("input/data.csv"));

        // Compute the similarity between users, according to their preferences
        UserSimilarity similarity = new EuclideanDistanceSimilarity(model);

        // Group the users with similar preferences
        UserNeighborhood neighborhood = new ThresholdUserNeighborhood(0.1,
                similarity, model);

        // Create a recommender
        UserBasedRecommender recommender = new GenericUserBasedRecommender(
                model, neighborhood, similarity);

        // For the user with the id 1 get two recommendations
        List<RecommendedItem> recommendations = recommender.recommend(1, 2);
        for (RecommendedItem recommendation : recommendations) {
            log.info("User 1 might like the book with ID: "
                    + recommendation.getItemID() + " (predicted preference :"
                    + recommendation.getValue() + ")");
        }
    }
}

Run the recommender:

mvn compile
mvn exec:java -Dexec.mainClass="com.technobium.BasicRecommender"

The result should be the following:

...
User 1 might like the book with ID: 3 (predicted preference :3.4530818)
User 1 might like the book with ID: 4 (predicted preference :1.8203772)

As we can see, we can recommend Book 3 and Book 4 to User 1, but Book 3 is more likely to be appreciated by this user since the predicted preference is higher for this books.

GitHub repository for this project: https://github.com/technobium/recommender

Conclusion

As you see, Apache Mahout is a machine learning Java library which can be easily used to build a recommendation engine. In real life, the Apache Mahout recommendation engine is used by companies like LinkedIn, Yahoo, Twitter, Intel or Foursquare.

References

http://mahout.apache.org/users/recommender/userbased-5-minutes.html

“Mahout in action”, Owen et. al., Manning Pub. 2011 – http://manning.com/owen/

One Comment

Add a Comment

Your email address will not be published. Required fields are marked *