# 如何选择分类器？LR、SVM、Ensemble、Deep learning

转自：https://www.quora.com/What-are-the-advantages-of-different-classification-algorithms

There are a number of dimensions you can look at to give you a sense of what will be a reasonable algorithm to start with, namely:

- Number of training examples
- Dimensionality of the feature space
- Do I expect the problem to be linearly separable?
- Are features independent?
- Are features expected to linearly dependent with the target variable? *EDIT: see mycomment on what I mean by this
- Is overfitting expected to be a problem?
- What are the system's requirement in terms of speed/performance/memory usage...?
- ...

This list may seem a bit daunting because there are many issues that are not straightforward to answer. The good news though is, that as many problems in life, you can address this question by following the Occam's Razor principle: use the least complicated algorithm that can address your needs and only go for something more complicated if strictly necessary. __Logistic Regression__

As a general rule of thumb, I would recommend to start with Logistic Regression. Logistic regression is a pretty well-behaved classification algorithm that can be trained as long as you expect your features to be roughly linear and the problem to be linearly separable. You can do some feature engineering to turn most non-linear features into linear pretty easily. It is also pretty robust to noise and you can avoid overfitting and even do feature selection by using l2 or l1 regularization. Logistic regression can also be used in Big Data scenarios since it is pretty efficient and can be distributed using, for example, ADMM (see logreg). A final advantage of LR is that the output can be interpreted as a probability. This is something that comes as a nice side effect since you can use it, for example, for ranking instead of classification.

Even in a case where you would not expect Logistic Regression to work 100%, do yourself a favor and run a simple l2-regularized LR to come up with a baseline before you go into using "fancier" approaches.

Ok, so now that you have set your baseline with Logistic Regression, what should be your next step. I would basically recommend two possible directions: (1) SVM's, or (2) Tree Ensembles. If I knew nothing about your problem, I would definitely go for (2), but I will start with describing why SVM's might be something worth considering. __Support Vector Machines__

Support Vector Machines (SVMs) use a different loss function (Hinge) from LR. They are also interpreted differently (maximum-margin). However, in practice, an SVM with a linear kernel is not very different from a Logistic Regression (If you are curious, you can see how Andrew Ng derives SVMs from Logistic Regression in his Coursera Machine Learning Course). The main reason you would want to use an SVM instead of a Logistic Regression is because your problem might not be linearly separable. In that case, you will have to use an SVM with a non linear kernel (e.g. RBF). The truth is that a Logistic Regression can also be used with a different kernel, but at that point you might be better off going for SVMs for practical reasons. Another related reason to use SVMs is if you are in a highly dimensional space. For example, SVMs have been reported to work better for text classification.

Unfortunately, the major downside of SVMs is that they can be painfully inefficient to train. So, I would not recommend them for any problem where you have many training examples. I would actually go even further and say that I would not recommend SVMs for most "industry scale" applications. Anything beyond a toy/lab problem might be better approached with a different algorithm. __Tree Ensembles__

This gets me to the third family of algorithms: Tree Ensembles. This basically covers two distinct algorithms: Random Forests and Gradient Boosted Trees. I will talk about the differences later, but for now let me treat them as one for the purpose of comparing them to Logistic Regression.

Tree Ensembles have different advantages over LR. One main advantage is that they do not expect linear features or even features that interact linearly. Something I did not mention in LR is that it can hardly handle categorical (binary) features. Tree Ensembles, because they are nothing more than a bunch of Decision Trees combined, can handle this very well. The other main advantage is that, because of how they are constructed (using bagging or boosting) these algorithms handle very well high dimensional spaces as well as large number of training examples.

As for the difference between Random Forests (RF) and Gradient Boosted Decision Trees (GBDT), I won't go into many details, but one easy way to understand it is that GBDTs will usually perform better, but they are harder to get right. More concretely, GBDTs have more hyper-parameters to tune and are also more prone to overfitting. RFs can almost work "out of the box" and that is one reason why they are very popular. __Deep Learning__

Last but not least, this answer would not be complete without at least a minor reference toDeep Learning. I would definitely not recommend this approach as a general-purpose technique for classification. But, you might probably have heard how well these methods perform in some cases such as image classification. If you have gone through the previous steps and still feel you can squeeze something out of your problem, you might want to use a Deep Learning approach. The truth is that if you use an open source implementation such as Theano, you can get an idea of how some of these approaches perform in your dataset pretty quickly. __Summary__

So, recapping, start with something simple like Logistic Regression to set a baseline and only make it more complicated if you need to. At that point, tree ensembles, and in particular Random Forests since they are easy to tune, might be the right way to go. If you feel there is still room for improvement, try GBDT or get even fancier and go for Deep Learning.

You can also take a look at the Kaggle Competitions. If you search for the keyword "classification" and select those that are completed, you will get a good sense of what people used to win competitions that might be similar to your problem at hand. At that point you will probably realize that using an ensemble is always likely to make things better. The only problem with ensembles, of course, is that they require to maintain all the independent methods working in parallel. That might be your final step to get as fancy as it gets.

## 如何选择分类器？LR、SVM、Ensemble、Deep learning的更多相关文章

- Deep Learning（深度学习）学习笔记整理
申明:本文非笔者原创,原文转载自:http://www.sigvc.org/bbs/thread-2187-1-3.html 4.2.初级(浅层)特征表示 既然像素级的特征表示方法没有作用,那怎样的表 ...

- 【转载】Deep Learning（深度学习）学习笔记整理
http://blog.csdn.net/zouxy09/article/details/8775360 一.概述 Artificial Intelligence,也就是人工智能,就像长生不老和星际漫 ...

- Deep Learning速成教程
引言 深度学习,即Deep Learning,是一种学习算法(Learning algorithm),亦是人工智能领域的一个重要分支.从快速发展到实际应用,短短几年时间里, ...

- 大牛deep learning入门教程
雷锋网(搜索"雷锋网"公众号关注)按:本文由Zouxy责编,全面介绍了深度学习的发展历史及其在各个领域的应用,并解释了深度学习的基本思想,深度与浅度学习的区别和深度学习与神经网络之 ...

- Deep Learning（深度学习）学习系列
目录: 一.概述 二.背景 三.人脑视觉机理 四.关于特征 4.1.特征表示的粒度 4.2.初级(浅层)特征表示 4.3.结构性特征表示 4.4 ...

- 深度学习概述教程--Deep Learning Overview
引言 深度学习,即Deep Learning,是一种学习算法(Learning algorithm),亦是人工智能领域的一个重要分支.从快速发展到实际应用,短短几年时间里, ...

- 深度学习（deep learning）
最近deep learning大火,不仅仅受到学术界的关注,更在工业界受到大家的追捧.在很多重要的评测中,DL都取得了state of the art的效果.尤其是在语音识别方面,DL使得错误率下降了 ...

- [转载]Deep Learning（深度学习）学习笔记整理
转载自:http://blog.csdn.net/zouxy09/article/details/8775360 感谢原作者:zouxy09@qq.com 八.Deep learning训练过程 8. ...

- Deep Learning（深度学习）学习笔记整理系列之（八）
Deep Learning(深度学习)学习笔记整理系列 zouxy09@qq.com http://blog.csdn.net/zouxy09 作者:Zouxy version 1.0 2013-04 ...

- 深度学习笔记之关于总结、展望、参考文献和Deep Learning学习资源（五）
不多说,直接上干货! 十.总结与展望 1)Deep learning总结 深度学习是关于自动学习要建模的数据的潜在(隐含)分布的多层(复杂)表达的算法.换句话来说,深度学习算法自动的提取分类需要的低层 ...

## 随机推荐

- jQuery学习之路（6）- 简单的表格应用
▓▓▓▓▓▓ 大致介绍 在CSS技术之前,网页的布局基本都是依靠表格制作,当有了CSS之后,表格就被很多设计师所抛弃,但是表格也有他的用武之地,比如数据列表,下面以表格中常见的几个应用来加深对jQue ...

- TCP协议疑难杂症全景解析
说明: 1).本文以TCP的发展历程解析容易引起混淆,误会的方方面面2).本文不会贴大量的源码,大多数是以文字形式描述,我相信文字看起来是要比代码更轻松的3).针对对象:对TCP已经有了全面了解的人. ...

- Chrome Crx 插件下载
扯蛋的GFW屏蔽了google域导致下载Chrome插件加载失败,本人想收集以些chrome的Crx插件,可供直接下载 XMarks - 在不同电脑不同浏览器之间同步书签 下载地址: http:/ ...

- java多线程学习-同步之线程通信
这个示例是网上烂大街的,子线程循环100次,主线程循环50次,但是我试了很多次,而且从网上找了很多示例,其实多运行几次,看输出结果并不正确.不知道是我转牛角尖了,还是怎么了.也没有大神问,好痛苦.现在 ...

- android 5.0 (lollipop)源码编译环境搭建（Mac OS X)
硬件环境:MacBook Pro Retina, 13-inch, Late 2013 处理器 2.4 GHz Intel Core i5 内存 8 GB 1600 MHz DDR3 硬盘60G以 ...

- ssh-copy-id帮你建立信任
一.ssh-keygen -t rsa [nameA@machineA]$ ssh-keygen -t rsa Generating public/private rsa key pair. Ente ...

- 初探appium之元素定位（1）
无论是selenium还是appium,元素定位都是我们开始实现自动化面临的第一个问题.selenium还好,我们可以在浏览器的调试页面进行元素定位还是蛮方便的.那么appium怎么做呢? 我看到很多 ...

- Docs list
http://www.deansys.com/doc/ldd3/index.html Github中文文档: http://www.worldhello.net/gotgithub/03-projec ...

- OA办公系统，一个沉淀企业文化的容器
资源是会枯竭的,唯有文化才会生生不息.一切工业产品都是人类智慧创造的.随着公司规模的扩大,企业中两大根本"人和规则"面临诸多挑战,OA办公系统是一个全员使用的办公软件产品,员工可通 ...

- Unity 打包总结和资源的优化和处理
1. Texture,都去掉alpha通道,作为背景展示的图片,基本都没有透明要求,有特殊要求的则放到atlas里面 a. Loading图这类需要比较精细的,则把图片设置为Automatic Tru ...