Skip to main content

Linear Regression in Dynamo


Today we are collecting more data than ever but making sense out of the data is very challenging. Machine learning helps us analyze the data and build an analytical model to help us with future prediction. To create a machine learning prediction model we usually develop a hypothesis based on our observation of the collected data and fine tune the model to reduce the cost function by training the model. One of the very simple hypothesis we can develop by assuming a linear relationship between input and output parameters.

Suppose we have data on housing price and we assume that housing prices are linearly related to the floor area of the house then we can use linear regression to predict the price of a house with specific floor area. One of the most important steps towards building the hypothesis is being able to visualize the data and understand the trend. So let first draw a scatter plot of our data as shown in the figure below. I am using Grapher package to draw the scatter plot.

The scatter plot shows that the relationship between floor area and price is not quite linear but it can be approximated to a linear curve. In another words the linear regression is a curve fitting problem. Let’s say that the desired curve is y = mx + c
Hence to build our model we need to find out the values of m and c. We can use linear algebra and represent this equation as y = A*X^{T} where A represents a matrix of coefficients i.e. A = \begin{bmatrix}c & m\end{bmatrix} and \:X = \begin{bmatrix}1 & x\end{bmatrix} and X^{T} is transpose of X. Now for a give y we can build our model to find the coefficient vector A, which can be obtained as A = (X^{T} * X)^{-1} * X^{T} * y.

Let’s say that we have a hypothesis that says the house price not just depend on floor area but it also depends on number of bedrooms. The parameters such as floor area or number of bedrooms are called features. To generalize we can express y as y = a_{0} + a_{1}*x_{1} + a_{2}*x_{2} + a_{3}*x_{3}... where the feature vector X can be represented as X = \begin{bmatrix} 1 & x_{1} & x_{2} & x_{3} &...\end{bmatrix} and the coefficient vector A can be represented as A = \begin{bmatrix} a_{0} & a_{1} & a_{2} & a_{3} &...\end{bmatrix}. Suppose we have ‘m’ samples of data to fit the curve and y depends on 'n' features, then the size of A is '1 x (n+1)' whereas the size of X is ‘m x (n+1)’ where the first column has all 1. To fit the optimal curve we would like to find out the coefficient vector A so that we can predict the value of y for any new feature vector X.

So now let's look at the Dynamo graph compute the coefficient vector A.
To predict new price based on the input feature vector we need to take a dot product with the coefficient vector. This provides a simple Linear regression model which works well as long as the inverse of (X^{T} * X) can be computed efficiently. For a very large number of features, computing inverse could be very expensive also if any two or more features are inter-dependent then also inverse can’t be computed because of singularity. If we take care of these two aspects then it’s pretty effective model.

What if the linearity is not a good hypothesis and we think that there is a polynomial relationship between input and output. We can still use the same computational model by making the polynomial terms as a new feature in the model.

Note: The Dynamo Graph demonstrated here, uses DynamoAI package and is shared at GitHub. I hope you would find it useful and please do send me your feedback on GitHub or on this page.

Comments

Popular posts from this blog

Polynomial regression and model comparison

Welcome back, in my previous post I described how we can perform linear regression using normal equation in Dynamo and I left with a question "what if the input and output are not linearly dependent?” Let’s say we have a hypothesis that the housing price doesn’t depend on floor area and number of rooms linearly but it has a following relationship as y = a_{0} * x_{0} + a_{1} * x_{1} + a_{2} * x_{2} + a_{3} * x_{1} * x_{2} + a_{4} * x_{1}^2 + a_{5} * x_{2}^2 \: where x_{0} = 1, \: x_{1} is floor area and x_{2} is number of rooms. Then we can introduce few new parameters x_{3} = x_{1} * x_{2}, \: x_{4} = x_{1}^2, \: x_{5} = x_{2}^2 and then perform the linear regression to find the coefficient matrix.

The Dynamo graph to setup the feature vector looks as follows. Once the feature vector is setup rest all is same as previous linear regression example.


Note that the price prediction for a given floor area and rooms we need to again construct the same feature vector. The price predic…

Associativity in AutoCAD

CAD applications often implement features or entities which are associative in nature, for example creating an extrude surface in AutoCAD 2011, by extruding a profile creates associative extrude surface. When you modify the profile the extruded surface will update itself to follow the profile.
AutoCAD developers now can develop fascinating applications using new associative framework. There are APIs available in C++ as well as .NET to implement associative features in AutoCAD. You can download the latest ObjectARX to make use of associative framework.
The associative framework in AutoCAD helps us to maintain relationship between objects. Relations are represented as hierarchical Networks of Objects, Actions and Dependencies between them. The core building blocks are:
Action - Defines behavior or intelligence of application or feature.Dependency- Maintains associativity/ relations between objects and notifies changes on dependent objects.Network- The associative modelObject- A regular …