Simple linear regression using JFreeChart

Linear regression is one of the simplest supervised learning technique. It is a method for modelling the relationship between one or more input variables X and one output variable Y. The model obtained after the training phase can  be in the case of simple linear regression, a mathematical function of the form f(X) = a  + b * X = Y. When the output variable depends on more than one input variable, then we have multiple linear regression.

In this tutorial we will see how we can use the simple linear regression to solve a day-to-day problem. We will start by defining the problem, we will then explain how the problem will be solved, then we will get to the practical part and write a Java application. Basic Java programming knowledge is required for this tutorial.

Estimate the price of a house using simple linear regression

The problem we will solve using this machine learning method is the estimation of the price of a house, giving its living area. For this tutorial I gathered the living area and the rent price for  47 houses in Berlin city center. The data is available in a simple text file. Our Java program should tell us the estimated price giving a living area.

To solve this problem we will use the JFreeChart library. This is a GNU Lesser General Public Licence (LGPL) Java chart library which offers also linear regression tools. We will start by analysing the input data by feeding it to a scatter chart. The chart will show us that the values are distributed in a form of a prolonged, symmetrical shape. This means the data can be fitted to a linear function. Using the JFreeChart library we will find out the a and b parameters of the function f(X) = a + b * X which best fits the input data. Here the is also called the intercept and b the slope. The key function here is Regression.getOLSRegression. OSL stands for ordinary least squares. This method minimizes the sum of squared vertical distances between the observed responses in the input data and the responses predicted by the linear approximation. The resulted function represents the “model” which our machine learning application will use to estimate new values, not present in the input dataset.

Visualising the input data

Let’s start with a Java application which shows us scatter chart of the input data.

Prerequisites:

  • Mac, Linux or Windows
  • Java 1.7
  • Eclipse with Maven support

Create a new Maven project

sr_create_maven_project

On the next step select “Create simple project (skip archetype creation)”.

Type in the project details.

sr_name_project

Edit pom.xml file and add the following dependecy in the dependecies section.

Add to this project a Java class named PriceEstimator.

sr_create_class

Paste the following code in the new class file:

In the src/main/resources create a text file named prices.txt which should contain the sample data available here.

You can now run the code. You should now see the blue points representing the living area on the x axis and the price on the y axis.

sr_scatter_graph

Add regression line to the graph

We can obtain the line which best matches the given points. To do that, add the following method at the end of the class.

At the end of the main function add the call to the new function.

You should now see a yellow regression line like the one in the following image.

sr_regression_line

Estimate the price

Add the following code to the end of main function:

Add the following method at the end of the class:

Now you should create a new Run Configuration, from Eclipse menu Run > Run Configurations. In the Arguments tab, for PriceEstimator Java Application add a number representing the desired living area, 200 for instance. If you run the app from here, you should see a graph like the following one:

sr_price_estimation

The red triangle represents the point with the living area 200. In the legend section you can see the rent price estimation for this area: 3104.

GitHub repository for this project: https://github.com/technobium/jfreechart-regression

Conclusion

We saw how we can use simple linear regression to estimate the rent price for a living area given a set of existing data. This is a basic usage of this machine learning technique. Which other use cases can you see for simple linear regression?

References

http://www.jfree.org/jfreechart/

http://en.wikipedia.org/wiki/Linear_regression

http://en.wikipedia.org/wiki/Ordinary_least_squares

4 Comments

Add a Comment

Your email address will not be published. Required fields are marked *