深度学习基础知识

NG深度学习中用到的函数

sigmoid函数

$$
\text{For } x \in \mathbb{R}^n \text{, } sigmoid(x) = sigmoid\begin{pmatrix}
x_1 \
x_2 \
… \
x_n \
\end{pmatrix} = \begin{pmatrix}
\frac{1}{1+e^{-x_1}} \
\frac{1}{1+e^{-x_2}} \
… \
\frac{1}{1+e^{-x_n}} \
\end{pmatrix}\tag{1}
$$

python代码实现:

1
2
3
4
import numpy as np
def sigmoid(x):
s = 1 / (1 + np.exp(-x))
return s

Sigmoid gradient(梯度,求导)

经过dS = ds/dx的求导运算发现dS = s*(1-s)
$$
sigmoid_derivative(x) = \sigma’(x) = \sigma(x) (1 - \sigma(x))\tag{2}
$$

1
2
3
4
def sigmoid_derivative(x):
s = sigmoid(x)
ds = s*(1-s)
return ds

image2vector函数

For example, in computer science, an image is represented by a 3D array of shape (length,height,depth=3). However, when you read an image as the input of an algorithm you convert it to a vector of shape (length∗height∗3,1). In other words, you “unroll”, or reshape, the 3D array into a 1D vector.

将image矩阵转换为1个向量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def images2vector(image):
v = image.reshape(image.shape[0]*image.shape[1]*image.shape[2],1)
return v
# 3*3*2 维矩阵 相当于 image中的 hight*width*rgb(图片中像素点由r,g,b三个组成)=3*2*3
image = np.array([[[0.67826139, 0.29380381],
[0.90714982, 0.52835647],
[0.4215251, 0.45017551]],

[[0.92814219, 0.96677647],
[0.85304703, 0.52351845],
[0.19981397, 0.27417313]],

[[0.60659855, 0.00533165],
[0.10820313, 0.49978937],
[0.34144279, 0.94630077]]])
v = image2vector(image)

print('v.shape: '+str(v.shape))
print('image2vector(image):')
print(v)
print('\n')
print(v.T)

输出:

v.shape: (18, 1)
image2vector(image):
[[0.67826139]
[0.29380381]
[0.90714982]
[0.52835647]
[0.4215251 ]
[0.45017551]
[0.92814219]
[0.96677647]
[0.85304703]
[0.52351845]
[0.19981397]
[0.27417313]
[0.60659855]
[0.00533165]
[0.10820313]
[0.49978937]
[0.34144279]
[0.94630077]]

[[0.67826139 0.29380381 0.90714982 0.52835647 0.4215251 0.45017551
0.92814219 0.96677647 0.85304703 0.52351845 0.19981397 0.27417313
0.60659855 0.00533165 0.10820313 0.49978937 0.34144279 0.94630077]]

Normalizing rows (单位化行向量)

Another common technique we use in Machine Learning and Deep Learning is to normalize our data. It often leads to a better performance because gradient descent converges faster after normalization. Here, by normalization we mean changing x to x∥x∥ (dividing each row vector of x by its norm).
$$
\frac{x}{| x|}
$$
For example, if
$$
x =
\begin{bmatrix}
0 & 3 & 4 \
2 & 6 & 4 \
\end{bmatrix}\tag{3}
$$
then
$$
| x| = np.linalg.norm(x, axis = 1, keepdims = True) = \begin{bmatrix}
5 \
\sqrt{56} \
\end{bmatrix}\tag{4}
$$
and
$$
x_normalized = \frac{x}{| x|} = \begin{bmatrix}
0 & \frac{3}{5} & \frac{4}{5} \
\frac{2}{\sqrt{56}} & \frac{6}{\sqrt{56}} & \frac{4}{\sqrt{56}} \
\end{bmatrix}\tag{5}
$$
Note that you can divide matrices of different sizes and it works fine: this is called broadcasting and you’re going to learn about it in part 5.

Exercise: Implement normalizeRows() to normalize the rows of a matrix. After applying this function to an input matrix x, each row of x should be a vector of unit length (meaning length 1).

1
2
3
4
5
6
7
8
9
10
11
12
# 将矩阵中的 横向量 单位化

def normalizeRows(x):
x_norm = np.linalg.norm(x, axis=1, keepdims=True) # keeding = True 代表计算后维度不变,axis=1 代表矩阵按行相加,axis=0代表按列相加
x = x / x_norm
return x

a = np.array([[1, 2, 3],
[3, 4, 0],
[2, 2, 2]])

print(normalizeRows(a))

输出:

[[0.26726124 0.53452248 0.80178373]
[0.6 0.8 0. ]
[0.57735027 0.57735027 0.57735027]]

Implement the L1 and L2 loss functions(损失函数)
$$
\begin{align*} & L_1(\hat{y}, y) = \sum_{i=0}^m|y^{(i)} - \hat{y}^{(i)}| \end{align*}\tag{6}
$$

$$
\begin{align*} & L_2(\hat{y},y) = \sum_{i=0}^m(y^{(i)} - \hat{y}^{(i)})^2 \end{align*}\tag{7}
$$

1
2
3
4
5
6
7
8
def L1(yhat, y):
loss = np.sum(abs(yhat - y))
return loss


def L2(yhat, y):
loss = np.dot(yhat - y, (yhat - y).T)
return loss

常用的一些基本数学知识

机器学习中的基本数学知识

LaTex数学公式语法

矩阵內积 dot

就是简单的矩阵相乘 ab

1
2
3
4
5
6
7
8
9
import numpy as np
a = np.array([[1,2,3],
[1,2,3],
[1,2,3]])
b = np.array([[1,2,3],
[1,2,3],
[1,2,3]])
c = np.dot(a,b)
print(c)

输出:

[[ 6 12 18]
[ 6 12 18]
[ 6 12 18]]

矩阵外积 outer
$$
x\oplus y=\begin{bmatrix}x_1&…&x_{1n}\
x_2 &…&x_{2n}\
\dots&\dots&\dots\
x_m&\dots&x{mn}
\end{bmatrix}\begin{bmatrix}y_1&…&y_{1q}\
y_2 &\dots&y_{2q}\
\dots&\dots&\dots\
y_p&\dots&y_{pq}
\end{bmatrix}
\
=\begin{bmatrix}x_1y_1&\dots&x_{1}y_{1q}&x_1y_2&\dots&x_1y_{pq}\
\dots&\dots&\dots&\dots&\dots&\dots\
x_{1n}y_1&\dots&x_{1n}y_{1q}&x_{1n}y_2&\dots&x_{1n}y_{pq}\
x_2y_1&\dots&x_{1}y_{1q}&x_2y_2&\dots&x_2y_{pq}\
\dots&\dots&\dots&\dots&\dots&\dots\
x_{mn}y_1&\dots&x_{mn}y_{1q}&x_{mn}y_2&\dots&x_{mn}y_{pq}\
\end{bmatrix}\tag{8}
$$

1
2
3
4
5
6
7
8
# 矩阵的 外积 np.outer
a = np.array([[1,2],
[3,4],
[5,6]])
b = np.array([[1,2],
[1,2],
[1,2]])
print(np.outer(a,b))

输出:

[[ 1 2 1 2 1 2]
[ 2 4 2 4 2 4]
[ 3 6 3 6 3 6]
[ 4 8 4 8 4 8]
[ 5 10 5 10 5 10]
[ 6 12 6 12 6 12]]

矩阵元素积 elementwise

根据python的广播特性

1
2
3
4
5
6
7
8
9
import numpy as np
a = np.array([[1,2,3],
[2,2,2],
[3,3,3]])
b = np.array([[1],
[2],
[3]])
c = a*b # 或者用c = np.multiply(a,b)
print(c)

输出:

[[1 2 3]
[4 4 4]
[9 9 9]]