I was going through the Coursera "Machine Learning" course, and in the section on multivariate linear regression something caught my eye. Andrew Ng presented the Normal Equation as an analytical solution to the linear regression problem with a least-squares cost function. He mentioned that in some cases (such as for small feature sets) using it is more effective than applying gradient descent; unfortunately, he left its derivation out.

Here I want to show how the normal equation is derived.

First, some terminology. The following symbols are compatible with the machine learning course, not with the exposition of the normal equation on Wikipedia and other sites - semantically it's all the same, just the symbols are different.

Given the hypothesis function:

We'd like to minimize the least-squares cost:

Where is the i-th sample (from a set of m samples) and is the i-th expected result.

To proceed, we'll represent the problem in matrix notation; this is natural, since we essentially have a system of linear equations here. The regression coefficients we're looking for are the vector:

Each of the m input samples is similarly a column vector with n+1 rows, being 1 for convenience. So we can now rewrite the hypothesis function as:

When this is summed over all samples, we can dip further into matrix notation. We'll define the "design matrix" X (uppercase X) as a matrix of m rows, in which each row is the i-th sample (the vector ). With this, we can rewrite the least-squares cost as following, replacing the explicit sum by matrix multiplication:

Now, using some matrix transpose identities, we can simplify this a bit. I'll throw the part away since we're going to compare a derivative to zero anyway:

Note that is a vector, and so is y. So when we multiply one by another, it doesn't matter what the order is (as long as the dimensions work out). So we can further simplify:

Recall that here is our unknown. To find where the above function has a minimum, we will derive by and compare to 0. Deriving by a vector may feel uncomfortable, but there's nothing to worry about. Recall that here we only use matrix notation to conveniently represent a system of linear formulae. So we derive by each component of the vector, and then combine the resulting derivatives into a vector again. The result is:

Or:

[Update 27-May-2015: I've written another post that explains in more detail how these derivatives are computed.]

Now, assuming that the matrix is invertible, we can multiply both sides by and get:

Which is the normal equation.

## 【转】Derivation of the Normal Equation for linear regression的更多相关文章

1. （三）用Normal Equation拟合Liner Regression模型

继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

2. CS229 3.用Normal Equation拟合Liner Regression模型

继续考虑Liner Regression的问题,把它写成如下的矩阵形式,然后即可得到θ的Normal Equation. Normal Equation: θ=(XTX)-1XTy 当X可逆时,(XT ...

3. Linear regression with multiple variables(多特征的线型回归)算法实例_梯度下降解法(Gradient DesentMulti)以及正规方程解法(Normal Equation)

,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, ,, , ...

4. machine learning (7)---normal equation相对于gradient descent而言求解linear regression问题的另一种方式

Normal equation: 一种用来linear regression问题的求解Θ的方法,另一种可以是gradient descent 仅适用于linear regression问题的求解,对其 ...

5. 机器学习入门：Linear Regression与Normal Equation -2017年8月23日22:11:50

6. 5种方法推导Normal Equation

引言: Normal Equation 是最基础的最小二乘方法.在Andrew Ng的课程中给出了矩阵推到形式,本文将重点提供几种推导方式以便于全方位帮助Machine Learning用户学习. N ...

7. Normal&#160;Equation&#160;Algorithm

和梯度下降法一样,Normal Equation(正规方程法)算法也是一种线性回归算法(Linear Regression Algorithm).与梯度下降法通过一步步计算来逐步靠近最佳θ值不同,No ...

8. coursera机器学习笔记-多元线性回归，normal equation

#对coursera上Andrew Ng老师开的机器学习课程的笔记和心得: #注:此笔记是我自己认为本节课里比较重要.难理解或容易忘记的内容并做了些补充,并非是课堂详细笔记和要点: #标记为<补 ...

9. Normal Equation

一.Normal Equation 我们知道梯度下降在求解最优参数$$\theta$$过程中需要合适的$$\alpha$$,并且需要进行多次迭代,那么有没有经过简单的数学计算就得到参数\(\theta ...

## 随机推荐

1. 【Arduino】使用C#实现Arduino与电脑进行串行通讯

在给Arduino编程的时候,因为没有调试工具,经常要通过使用串口通讯的方式调用Serial.print和Serial.println输出Arduino运行过程中的相关信息,然后在电脑上用Arduin ...

2. PLSQL_性能优化系列15_Oracle Explain Plan解析计划解读

2014-12-19 Created By BaoXinjian

3. 原始启动log&amp;新log

root@Taiyear:/# U-Boot 1.1.3 (Dec 27 2013 - 09:14:28) SoC:MediaTek MT7620 DRAM:  Memory Testing..655 ...

4. WordPress BulletProof Security插件多个HTML注入漏洞

漏洞名称: WordPress BulletProof Security插件多个HTML注入漏洞 CNNVD编号: CNNVD-201308-023 发布时间: 2013-08-06 更新时间: 20 ...

5. javascript获取页面各种高度

网页可见区域宽: document.body.clientWidth网页可见区域高: document.body.clientHeight网页可见区域宽: document.body.offsetWi ...

6. css一些特别效果设定

7. MyBatis中多对多关系的映射和查询

先说一下需求: 在页面上显示数据库中的所有图书,显示图书的同时,显示出该图书所属的类别(这里一本书可能同时属于多个类别) 创建表: 笔者这里使用 中间表 连接 图书表 和 图书类别表,图书表中 没有使 ...

8. E - Coin Change UVA - 674 &amp;&amp;（一些记录路径的方法）

这一道题并不难,我们只需要将dp数组先清空,再给dp[0]=1,之后就按照完全背包的模板写 主要是我们要证明着一种方法不会出现把(1+3+4)(1+4+3)当作两种方法,这一点如果自己写过背包的那个表 ...

9. 爬虫_电影天堂 热映电影（xpath）

写了一天才写了不到100行.不过总归是按自己的思路完成了 import requests from lxml import etree import time BASE = 'http://www.d ...

10. [PHP] 算法-删除链表中重复的结点的PHP实现

删除链表中重复的结点: 1.定义两个指针pre和current 2.两个指针同时往后移动,current指针如果与后一个结点值相同,就独自往前走直到没有相等的 3.pre指针next直接指向curre ...