人工神经网络相对于支持向量机有什么优势？ [关闭]

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for s

asdfgh0077

1406人浏览 · 2020-06-02 11:22:32

asdfgh0077 · 2020-06-02 11:22:32 发布

本文翻译自：What are advantages of Artificial Neural Networks over Support Vector Machines? [closed]

ANN (Artificial Neural Networks) and SVM (Support Vector Machines) are two popular strategies for supervised machine learning and classification. ANN（人工神经网络）和SVM（支持向量机）是监督机器学习和分类的两种流行策略。 It's not often clear which method is better for a particular project, and I'm certain the answer is always "it depends." 通常不清楚哪种方法对特定项目更好，而且我确定答案总是“它取决于”。 Often, a combination of both along with Bayesian classification is used. 通常，使用两者的组合以及贝叶斯分类。

These questions on Stackoverflow have already been asked regarding ANN vs SVM: 有关ANN与SVM的问题已经在Stackoverflow上提出了这些问题：

ANN and SVM classification ANN和SVM分类

what the difference among ANN, SVM and KNN in my classification question ANN，SVM和KNN在我的分类问题上有什么区别

Support Vector Machine or Artificial Neural Network for text processing? 支持向量机或人工神经网络进行文本处理？

In this question, I'd like to know specifically what aspects of an ANN (specifically, a Multilayer Perceptron) might make it desirable to use over an SVM? 在这个问题中，我想具体了解人工神经网络（特别是多层感知器）的哪些方面可能需要在SVM上使用？ The reason I ask is because it's easy to answer the opposite question: Support Vector Machines are often superior to ANNs because they avoid two major weaknesses of ANNs: 我问的原因是因为很容易回答相反的问题：支持向量机通常优于人工神经网络，因为它们避免了人工神经网络的两个主要缺点：

(1) ANNs often converge on local minima rather than global minima, meaning that they are essentially "missing the big picture" sometimes (or missing the forest for the trees) （1）人工神经网络通常会集中在局部最小值而不是全局最小值，这意味着它们有时基本上“错过了大局”（或者错过了树木的森林）

(2) ANNs often overfit if training goes on too long, meaning that for any given pattern, an ANN might start to consider the noise as part of the pattern. （2）如果训练时间过长，人工神经网络常常过度拟合 ，这意味着对于任何给定的模式，人工神经网络可能会开始将噪声视为模式的一部分。

SVMs don't suffer from either of these two problems. SVM不会遇到这两个问题中的任何一个。 However, it's not readily apparent that SVMs are meant to be a total replacement for ANNs. 然而，并不是很明显SVM应该是人工神经网络的完全替代品。 So what specific advantage(s) does an ANN have over an SVM that might make it applicable for certain situations? 那么人工神经网络对SVM有哪些具体优势可能使其适用于某些情况？ I've listed specific advantages of an SVM over an ANN, now I'd like to see a list of ANN advantages (if any). 我已经列出了SVM相对于ANN的特定优势，现在我想看一下ANN优势列表（如果有的话）。

#1楼

参考：https://stackoom.com/question/mo9E/人工神经网络相对于支持向量机有什么优势-关闭

#2楼

Judging from the examples you provide, I'm assuming that by ANNs, you mean multilayer feed-forward networks (FF nets for short), such as multilayer perceptrons, because those are in direct competition with SVMs. 从您提供的示例来看，我假设通过ANN，您的意思是多层前馈网络（简称FF网络），例如多层感知器，因为它们与SVM直接竞争。

One specific benefit that these models have over SVMs is that their size is fixed: they are parametric models, while SVMs are non-parametric. 这些模型相对于SVM的一个特定好处是它们的大小是固定的：它们是参数模型，而SVM是非参数模型。 That is, in an ANN you have a bunch of hidden layers with sizes h ₁ through h _n depending on the number of features, plus bias parameters, and those make up your model. 也就是说，在ANN中，您有一堆隐藏层，大小为h ₁到h _n，具体取决于要素的数量，加上偏差参数，以及构成模型的那些。 By contrast, an SVM (at least a kernelized one) consists of a set of support vectors, selected from the training set, with a weight for each. 相比之下，SVM（至少是核心化的）由一组支持向量组成，这些支持向量从训练集中选择，每个支持向量具有权重。 In the worst case, the number of support vectors is exactly the number of training samples (though that mainly occurs with small training sets or in degenerate cases) and in general its model size scales linearly. 在最坏的情况下，支持向量的数量正好是训练样本的数量（尽管主要发生在小训练集或退化情况下），并且通常其模型大小线性地缩放。 In natural language processing, SVM classifiers with tens of thousands of support vectors, each having hundreds of thousands of features, is not unheard of. 在自然语言处理中，具有数万个支持向量的SVM分类器（每个都具有数十万个特征）并非闻所未闻。

Also, online training of FF nets is very simple compared to online SVM fitting, and predicting can be quite a bit faster. 此外，与在线SVM拟合相比，FF网络的在线培训非常简单，并且预测可以快得多。

EDIT : all of the above pertains to the general case of kernelized SVMs. 编辑：以上所有内容都与内核SVM的一般情况有关。 Linear SVM are a special case in that they are parametric and allow online learning with simple algorithms such as stochastic gradient descent. 线性SVM是一种特殊情况，它们是参数化的，允许使用简单算法（如随机梯度下降）进行在线学习。

#3楼

We should also consider that the SVM system can be applied directly to non-metric spaces, such as the set of labeled graphs or strings. 我们还应该考虑SVM系统可以直接应用于非度量空间，例如标记图形或字符串集。 In fact, the internal kernel function can be generalized properly to virtually any kind of input, provided that the positive definiteness requirement of the kernel is satisfied. 实际上，只要满足内核的正定性要求，内部内核函数就可以适当地推广到几乎任何类型的输入。 On the other hand, to be able to use an ANN on a set of labeled graphs, explicit embedding procedures must be considered. 另一方面，为了能够在一组标记图上使用ANN，必须考虑显式嵌入过程。

#4楼

One thing to note is that the two are actually very related. 需要注意的是，这两者实际上非常相关。 Linear SVMs are equivalent to single-layer NN's (ie, perceptrons), and multi-layer NNs can be expressed in terms of SVMs. 线性SVM等效于单层NN（即，感知器），并且多层NN可以用SVM表示。 See here for some details. 请看这里了解一些细节。

#5楼

One obvious advantage of artificial neural networks over support vector machines is that artificial neural networks may have any number of outputs, while support vector machines have only one. 人工神经网络相对于支持向量机的一个明显优势是人工神经网络可以具有任意数量的输出，而支持向量机只有一个。 The most direct way to create an n-ary classifier with support vector machines is to create n support vector machines and train each of them one by one. 使用支持向量机创建n-ary分类器的最直接方法是创建n个支持向量机并逐个训练它们。 On the other hand, an n-ary classifier with neural networks can be trained in one go. 另一方面，可以一次训练具有神经网络的n元分类器。 Additionally, the neural network will make more sense because it is one whole, whereas the support vector machines are isolated systems. 此外，神经网络将更有意义，因为它是一个整体，而支持向量机是孤立的系统。 This is especially useful if the outputs are inter-related. 如果输出是相互关联的，这尤其有用。

For example, if the goal was to classify hand-written digits, ten support vector machines would do. 例如，如果目标是对手写数字进行分类，则可以使用十个支持向量机。 Each support vector machine would recognize exactly one digit, and fail to recognize all others. 每个支持向量机只能识别一个数字，并且无法识别所有其他数字。 Since each handwritten digit cannot be meant to hold more information than just its class, it makes no sense to try to solve this with an artificial neural network. 由于每个手写数字不能仅仅意味着保存比其类更多的信息，因此尝试用人工神经网络解决这个问题是没有意义的。

However, suppose the goal was to model a person's hormone balance (for several hormones) as a function of easily measured physiological factors such as time since last meal, heart rate, etc ... Since these factors are all inter-related, artificial neural network regression makes more sense than support vector machine regression. 然而，假设目标是模拟人的激素平衡（对于几种激素）作为易于测量的生理因素的函数，例如自上次进餐以来的时间，心率等...因为这些因素都是相互关联的，人工神经网络回归比支持向量机回归更有意义。

#6楼

If you want to use a kernel SVM you have to guess the kernel. 如果要使用内核SVM，则必须猜测内核。 However, ANNs are universal approximators with only guessing to be done is the width (approximation accuracy) and height (approximation efficiency). 然而，人工神经网络是通用的近似，只有猜测要做的是宽度（近似精度）和高度（近似效率）。 If you design the optimization problem correctly you do not over-fit (please see bibliography for over-fitting). 如果您正确设计了优化问题，则不要过度拟合（请参阅参考书目以了解过度拟合）。 It also depends on the training examples if they scan correctly and uniformly the search space. 如果它们正确且一致地扫描搜索空间，它还取决于训练示例。 Width and depth discovery is the subject of integer programming. 宽度和深度发现是整数编程的主题。

Suppose you have bounded functions f(.) and bounded universal approximators on I=[0,1] with range again I=[0,1] for example that are parametrized by a real sequence of compact support U(.,a) with the property that there exists a sequence of sequences with 假设你在I = [0,1]上有有界函数f（。）和有界通用逼近器，并且范围再次为I = [0,1]，例如由紧凑支持U（。，a）的实数序列参数化存在序列序列的属性

lim sup { |f(x) - U(x,a(k) ) | : x } =0

and you draw examples and tests (x,y) with a distribution D on IxI . 并在IxI上IxI分布D的示例和测试(x,y) 。

For a prescribed support, what you do is to find the best a such that 对于规定的支持，你所做的就是找到最好的支持

sum {  ( y(l) - U(x(l),a) )^{2} | : 1<=l<=N } is minimal

Let this a=aa which is a random variable!, the over-fitting is then 设a=aa是一个随机变量！，然后过拟合

average using D and D^{N} of ( y - U(x,aa) )^{2} 平均使用D and D^{N} of ( y - U(x,aa) )^{2}

Let me explain why, if you select aa such that the error is minimized, then for a rare set of values you have perfect fit. 让我解释一下为什么，如果你选择aa使误差最小化，那么对于一组罕见的值你就完全适合了。 However, since they are rare the average is never 0. You want to minimize the second although you have a discrete approximation to D. And keep in mind that the support length is free. 但是，由于它们很少见，因此平均值永远不会为0.尽管您对D有一个离散近似值，但您希望最小化第二个。请记住，支持长度是免费的。