3B1B 的视频看了会上瘾!做得太棒了
But what is a Neural Network?
website: https://www.3blue1brown.com/lessons/neural-networks
video: https://www.youtube.com/watch?v=aircAruvnKk
-
Plain vanilla——multilayer perception多层感知器
-
classic example: recognize handwritten digits
-
neurons: thing that holds a number(activation)
-
layers:
- input layer; output layer
- hidden layers
- why 2 hidden layers and with 16 neurons?
-
core: how activation of the former layer influence the activation of the latter layer?
- e.g. recognize the loop on the top: 8,9
- how to recognize these edges, loops, and patterns? break them down into little pieces?
- e.g. recognize the loop on the top: 8,9
-
edge detection example
-
what parameters?
- weights
- calculate the weighted sum of the activation from the input layer
- e.g. 为边的位置赋正值,周围位置赋负值,其余位置赋0值。
-
requirement: all activations
- functions:
- sigmoid 用表示
- 但现在几乎不用sigmoid了,用ReLU之类比较多(deep NN)
- ReLU(a)=max(0,a) (inactive below 0)
- biase偏置值,不希望太容易被激活 用b表示
- functions:
-
too many weights and biases!
- learning: finding the right weights and biases
第一层简化为:
-
-
re-understand the neurons as functions that outputs the result of the former layer.
deep learning
gradient descent
-
cost function—>average cost
- input: weights&biases
- output : 1 number (the cost)
- parameters: training examples
-
: single input. how to find its minimum?
- slope detection&flowing: local minimum (depended on the random start)
- local vs global,老生常谈
- slope detection&flowing: local minimum (depended on the random start)
-
: two inputs
- , gradient, the direction of the steepest increase
- backpropagation
-
network learning=minimizing the cost function
- smooth output (continuous ranging activation)
-
how gradient of the cost function of 13000 dims impact? 另一种思考方式:
- 大小:说明哪些维度重要
- 正负:说明该维度上应移动的方向
- (每个维度上包含权重和偏差)
analyze this network
loose patterns——not intelligence…
not picking edges and loops
learn more
要了解更多信息,我强烈推荐迈克尔·尼尔森的书
http://neuralnetworksanddeeplearning…
本书逐步介绍了这些视频中示例背后的代码,您可以在此处找到:
https://github.com/mnielsen/neural-ne…
MNIST 数据库:
http://yann.lecun.com/exdb/mnist/
另请查看 Chris Olah 的博客:
他关于神经网络和拓扑的文章特别漂亮,但说实话,那里的所有内容都很棒。 如果您喜欢,您一定会_喜欢_ distill 的出版物:
research corner
两篇论文:
a closer look at memorization in deep Networks
the loss surfaces of multilayer networks
backpropagation
intuitive walkthrough
without notations!
-
bad network, silly outcome
-
only adjust weights & biases
for example, the graph”2”.
-
you want to increase the value of 2 from 0.2 to 1, as the value
there are 3 ways:
-
increase b
-
increase
- in proportion to (增加对应更大的的,性价比更高)
- “Neurons that fire together wire together”
-
change
- in proportion to (正权重增大,负权重减小)
-
-
and also decrease the value of other numbers.
So we add up all the last-layer neurons’ desire effects, and get the wanted nudges for the second last layer.
THAT’S THE FIRST PROPAGATION!
每一层的nudges加和,每个样本的总nudges加和求平均,得到相量的倍数。(非精确量化)
如何偷懒?
把样本分成mini-batchs,每个mini-batch算,再综合。
称为随机梯度下降(Stochastic gradient descent)
derivatives in computational graphs
dive a little bit into the calculus!
由浅入深!
-
每层一个神经元
- 神经元index:
- 上标表示层数,如表示最后一层(个)神经元
- desired output最终层激活值:记作y(0 or 1)
- Cost
-
- 令
- 则
- 对的微小变化有多敏感?
-
- 同理;每层的多个同理;各层的同理。
- 每层不止一个神经元:加下标
- & :
-
- 需要加和