开发工具:
文件大小: 5mb
下载次数: 0
上传时间: 2019-07-03
详细说明:深度学习的起源.pdfON THE ORIGIN OF DEEP LEARNING
Table 1: Major milestones that will be covered in this paper
Year
Contributer
Contribution
300BC
Aristotle
introduced Associationism, started the history of human's
attempt to understand brain
1873
Alexander bain
introduced Neural groupings as the earliest models of
neural network, inspired Hebbian Learning Rule
introduced mcp model. which is considered as the
1943
Mcculloch pitts
ancestor of artificial Neural model
considered as the father of neural networks introduced
1949
Donald hebb
Hebbian Learning rule, which lays the foundation of
modern neural network
1958
Frank rosenblatt
introduced the first perceptron, which highly resembles
modern perceptron.
1974
Paul werbos
introduced Backpropagation
Teuvo Kohonen
1980
introduced Self Organizing Map
introduced Neocogitron, which inspired convolutional
Kunihiko Fukushima
Neural Network
1982
ohn llop
introduced Ilopfield Network
1985
Hilton sejnowski introduced Boltzmann Machine
introduced harmonium. which is later known as restricted
1986
Paul Smolensky
Boltzmann machine
Michael i. ordan defined and introduced recurrent neural network
1990
Yann Le cun
introduced LeNet, showed the possibility of deep neural
nctworks in practicc
Schuster paliwal introduced Bidirectional Recurrent Neural Network
1997
Hochreiter
introduced LSTM, solved the problem of vanishing
Schmidhuber
gradient in recurrent neural networks
introduced Deep belief Networks, also introduced
2006
Geoffrey Hinton
layer-wise pretraining technique, opened current deep
earning era
Salakhutdinov
2009
nton
introduced Deep boltzmann Machines
2012
Geoffrey Hinton
introduced Dropout, an efficient way of training neural
networks
WANG. RAJ AND XING
orate well enough on cach of them. On the othcr hand, our papor is aimed at providing the
background for readers to understand how these models are developed. Therefore, we em
phasize on the milestones and elaborate those ideas to help build associations between these
deas. In addition to the paths of classical deep learning models in(Schmidhuber. 2015)
we also discuss those recent deep learning work that builds from classical linear models
Another article that rcadcrs could rcad as a complementary is(Andcrson and Rosenfeld
2000) where the authors conducted extensive interviews with well-known scientific leaders
in the 90s on the topic of the neural networks, history.
ON THE ORIGIN OF DEEP LEARNING
2. From aristotle to modern artificial Neural networks
The study of dccp learning and artificial neural nctworks originates from our ambition to
build a computer system simulating the human brain. To build such a system requires
understandings of the functionality of our cognitive system. Therefore, this paper traces all
the way back to the origins of attempts to understand the brain and starts the discussion
of Aristotle,s Associationism around 300 B c
2.1 Associationism
When, therefore. we accomplish an act of reminiscence, we pass through a
certain series of precursive movements, until we arrive at a movement on which
the one we are in quest of is habitually consequent. Hence, too, it is that we
hunt through the mental train, excogitating from the present or some other
and from similar or contrary or coadjacent. Through this process reminiscence
takes place. For the movements are, in these cases, sometimes at the same time,
sometimes parts of the same whole, so that the subsequent movement is already
more than half accomplished
This remarkable paragraph of Aristotle is seen as the starting point of Association
ism(Burnham, 1888). Associationism is a theory states that mind is a set of conceptual
elements that are organized as associations between these elements. Inspired by plato
Aristotle examined the processes of remembrance and recall and brought up with four laws
of association(Boeree, 2000)
Contiguity: Things or events with spatial or temporal proximity tend to be associated
in thc mind
Frequency: The number of occurrences of two events is proporLional to the strength
of association bctwccn thesc two events
Similarily: Thought of one event lends to trigger the thought of a siinilar event
Contrast: Thought of one event tends to trigger the thought of an opposite event
Back then, Aristotle described the implementation of these laws in our mind as common
sense. For example, the feel, the smell, or the taste of an apple should naturally lead to
the concept of an apple, as common sense. Nowadays, it is surprising to see that these
laws proposed more than 2000 years ago still serve as the fundamental assumptions of
machine learning methods. For example, samples that are near each other(under a defined
distance) are clustered into one group; explanatory variables that frequently occur with
response variables draw more attention from the model; similar/dissimilar data are usually
represented with more similar/dissimilar embeddings in latent space
Contemporaneously, similar laws were also proposed by Zeno of Citium, Epicurus and
St Augustine of Hippo. The theory of associationism was later strengthened with a variety
of philosophers or psychologists. Thomas Hobbes(1588-1679) stated that the complex
experiences were the association of simple experiences, which were associations of sensations
He also believed Chal association exists by means of coherence anld Irequency as its strength
WANG. RAJ AND XING
Z
Figure 1: Illustration of neural groupings in(Bain, 1873)
factor. Meanwhile, John Locke(1632-1701)introduced the concept of"association of ideas
He still separated the concept of ideas of sensation and ideas of reflection and he stated
that complex ideas could be derived from a combination of these two simple ideas. david
Hume(1711-1776) later reduced Aristotle's four laws into three: resemblance(similarity)
contiguity, and cause and effect. He believed that whatever coherence the world seemed to
have was a matter of these three laws. Dugald Stewart(1753-1828)extended these three
laws with several other principles, among an obvious one: accidental coincidence in the
sounds of words. Thomas Reid(1710-1796)believed that no original quality of mind was
required to explain the spontaneous recurrence of thinking, rather than habits. James Mill
(1773-1836)emphasized on the law of frequency as the key to learning, which is very similar
to later stages of research
David Hartley(1705-1757, as a physician, was remarkably regarded as the one that
made associationism popular(Hartley, 2013). In addition to existing laws, he proposed his
argument that memory could be conceived as smaller scale vibrations in the same regions
of the brain as the original sensory experience. These vibrations can link up to represent
complex idcas and thcrcforc act as a matcrial basis for the strcam of consciousness. Th
dea potentially inspired Hebbian learning rule, which will be discussed later in this paper
to lay the foundation of neural networks
2.2 Bain and Neural groupings
Besides David Hartley, Alexander Bain(1818-1903) also contributed to the fundamental
ideas of Hebbian Learning Rule( Wilkes and Wade, 1997). In this book, Bain(1873)related
the processes of associative memory to the distribution of activity of neural groupings(a
term that he used to denote neural networks back then). Ile proposed a constructive mode
of storage capable of assembling what was required, in contrast to alternative traditional
mode of storage with prestored memories
To further illustrate his ideas, Bain first described the computational flexibility that
allows a neural grouping to function when multiple associations are to be stored. With
a few hy pothesis, Bain managed to describe a structure that highly resembled the neural
ON THE ORIGIN OF DEEP LEARNING
nctworks of today: an individual ccll is summarizing the stimulation from othcr sclcctcd
linked cells within a grouping as showed in Figure 1. The joint stimulation from a and c
triggers X, stimulation from b and c triggers y and stimulation from a and c triggers Z. In
his original illustration, a, b, c stand for simulations, X and y are outcomes of cells
With the establishment of how this associative structure of neural grouping can function
as memory, Bain proceeded to describe the construction of these structures. Hle followed the
directions of associationism and stated that relevant impressions of neural groupings must
be made in temporal contiguity for a period, either on one occasion or repeated occasions
Further, Bain described the computational properties of neural grouping: connections
are strengthened or weakened through experience via changes of intervening cell-substance
Therefore, the induction of these circuits would be selected comparatively strong or weak
As we will see in the following section, Hebb's postulate highly resembles Bains de-
scription, although nowadays we usually label this postulate as Hebb's, rather than Bains
according to(Wilkes and Wade. 1997). This omission of Bain's contribution may also be
due to bains lack of confidence in his own theory: Eventually, Bain was not convinced by
himself and doubted about the practical values of neural groupings
2.3 Hebbian Learning Rule
Ilebbian Learning Rule is named after Donald O. Ilebb(1904-1985) since it was introduced
in his work The Organization of Behavior(Hebb, 1949 ). Hebb is also seen as the father of
Neural Networks because of this work(Didier and Bigand, 2011)
In 1949, Hebb stated the famous rule: "Cells that fire together, wire together, which
emphasized on the activation behavior of co-fired cells. More specifically, in his book, he
stated that
When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it some growth process or metabolic change
takes place in one or both cells such that As efficiency, as one of the cells firing
B, is increased.”
This archaic paragraph can be re-written into modern machine learning languages as the
following
△=m:y
where
Aw; stands for the change of synaptic weights(wi) of Neuron i, of which the input
signal is xi. y denotes the postsynaptic response and n denotes learning rate. In other
words, Hebbian Learning Rule states that. the connection between two units should be
strengthened as the frequency of co-occurrences of these two units increase
Although Hebbian Learning Rule is seen as laying the foundation of neural networks,
seen today, its drawbacks are obvious: as co-occurrences appear more, the weights of connec
tions keep increasing and the weights of a dominant signal will increase exponentially. This
is known as the unstableness of Hebbian Learning Rule(Principe et al., 1999). Fortunately
these problens did not inlluence Hebb's identity as the lather of neural networks
WANG. RA AND XING
2.4 Ojas Rule and Principal Component analyzer
Erkki Oja extended Hebbian Learning rule to avoid the unstableness property and he also
showed Chat a neuron, following Chis updating rule, is approximating the behavior of a
Principal Component Analyzer(PCA)(Oja 1982)
Long story short, Oja. introduced a. normalization term to rescue Hebbian Learning
rule, and further he showed that his learning rule is simply an online update of Principal
Component Analyzer. We present the details of this argument in the following paragraphs
Starting [roin equalion 1 and Tollowing the saine notation, Oja showed
t+1
t]iy
where t denotes the iteration. A straightforward way to avoid the exploding of weights is
to apply normalization at the end of each iteration, yielding
t+1
wi +nriy
2=1(2+m;y92)
where n denotes the nunber of neurons. The above equalion can be further expanded inlo
the following form
t+1
In(zl
Z3
)+O(n2)
where
Z=O2 w22. Further, two more assumptions are introduced: 1)n is small
Therefore O(m )is approximately 0. 2)Weights are normalized, therefore Z=2i wz)
When these two assumptions were introduced back to the previous equation, Ojas rule
was proposed as following
I=wi+y( -yui
2)
Oja took a step further to show that a neuron that was updated with this rulc was
effectively performing Principal Component Analysis on the data. To show this, Oja first
re-wrote Equation 2 as the following forms with two additional assumptions(oja, 1982)
d(t)2
(12)2C2)a
where c is the covariance matrix of input X. Then he proceeded to show this property
with many conclusions from his another work(Oja and Karhunen, 1985) and linked back
to PCa with the fact that components from PCA are eigenvectors and the first component
is the eigenvector corres ponding to largest eigenvalues of the covariance natrix. Intuitively
we could interpret this property with a simpler explanation: the eigenvectors of C are the
solution when we maximize the rule updating function. Since ui are the eigenvectors of the
covariance matrix of X, we can get that w; are the PCa
Ojas learning rule concludes our story of learning rules of the early-stage neural network
Now we proceed to visit the ideas on neural nodels
ON THE ORIGIN OF DEEP LEARNING
2.5 MCP Neural Model
While Donald Hebb is seen as the father of neural networks, the first model of neuron
could trace back lo six years ahead of the publication of Hebbian Learning Rule, when
a neurophysiologist Warren McCulloch and a mathematician Walter Pitts speculated the
inner workings of neurons and modeled a primitive neural network by electrical circuits
their findings(McCulloch and Pitts, 1943 ). Their model, known as Mcp neural
model, was a linear step function upon weighted linearly interpolated data that could be
described as
y=2:2
AND
0,V
0. otherwise
whore y stands for output, i stands for input of signals, wi stands for the corrcsponding
weights and z, stands for the inhibitory input. 6 stands for the threshold. The function is
designed in a way that the activity of any inhibitory input completely prevents excitation
of the neuron at any time
Despite the resemblance between MCP Neural Model and modern perceptron, they are
still different distinctly in many different aspects
MCP Neural model is initially built as electrical circuits. Later we will see that the
study of neural networks has borrowed many ideas from the field of electrical circuits
The weights of MCP Neural Model wi are fixed, in contrast to the adjustable weights
in Modern perceptron. All the weights Inust be assigned with manual calculalion
The idea of inhibitory input is quite unconventional even seen today. It might be an
idea. worth furt, her study in modern deep learning research
2.6 Perceptron
With the success of mcp neural model. frank rosenblatt further substantialized hebbian
Learning Rule with the introduction of perceptrons(Rosenblatt, 1958). While theorists
like Hebb were focusing on the biological system in the natural environment, Rosenblatt
constructed the electronic device named Perceptron that was showed with the ability to
learn in accordance with associationism
Rosenblatt(1958)introduced the perceptron with the context of the vision system, as
showed in Figure 2a. He introduced the rules of the organization of a perceptron as
following
Stimuli impact on a retina of the sensory units, which respond in a. manner that the
pulse amplitude or frequency is proportional to the stimulus intensity.
Impulses are transmitted to Projection Area(Al). This projection area is optiona
Impulses are then transmitted to Association Area through random connections. If
the sum of impulse intensities is equal to or greater than the threshold(e)of this unit
Chen this unit fires
WANG. RAJ AND XING
R幽na
AEson&&
a
AreA
已
(a Illustration of organization of a perceptron in (b)A typical perceptron in modern machine learn-
( Rosenblatt 1958)
ing literature
Figurc 2: Pcrccptrons:(a)A now figurc of thc illustration of organization of pcrceptron as
in(Rosenblatt, 1958).(b )A typical perceptron nowadays, when Ar(Projection
Area) is omitted
Response units work in the same fashion as those intermediate units
Figure 2(a)illustrates his explanation of perceptron. From left to right. the four units
are sensory unit, projection unit, association unit and response unit respectively. Projection
unit receives the information from sensory unit and passes onto association unit. This unit
is often omitted in other description of similar models. With the omission of projection
unit, the structure resembles the structure of nowadays perceptron in a neural network(as
showed in Figure 2(b): sensory units collect data, association units linearly adds these data
with different weights and apply non-linear transform onto the thresholded sum, then pass
the results to response units
One distinction between the early stage neuron models and modern perceptrons is the
introduction of non-linear activation functions (we use sigmoid function as an example
in Figure 2(b)). This originates from the argument that linear threshold function should
be softened to simulate biological neural networks(Bose et al. 1996)as well as from
consideration of the feasibility of computation to replace step function with a continuous
one(Mitchell et al., 1997)
After Rosenblatt's introduction of Perceptron, Widrow et a.L.(1960)introduced a follow-
up model called ADALINE. However, the difference between Rosenblatt's Perceptron and
ADALINE is mainly on the algorithm aspect. As the primary focus of this paper is neural
network models, we skip the discussion of ADALINE
2.7 Perceptron's Linear Representation Power
a perceptron is fundamentally a linear function of input signals; therefore il is limited lo
represent linear decision boundaries like the logical operations like NOT, AND or OR, but
not XOR when a. more sophisticated decision boundary is required. This limitation was
highlighted by Minski and Papert (1969), when they attacked the limitations of perceptions
by emphasizing that perceptrons cannot solve functions like XOR or NXOR. As a result
very litlle research was done in Chis area until aboul the 1980s
10
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.