Some points in ML

KL divergence (KL 散度)

good explanation:

If you like ascii art:

"VALID" = without padding:

1
2
3

inputs:         1  2  3  4  5  6  7  8  9  10 11 (12 13)
               |________________|                dropped
                              |_________________|

"SAME" = with zero padding:

            pad|                                      |pad
inputs:      0 |1  2  3  4  5  6  7  8  9  10 11 12 13|0  0
            |________________|
                           |_________________|
                                          |________________|

In this example:

Input width = 13
Filter width = 6
Stride = 5

Notes:

"VALID" only ever drops the right-most columns (or bottom-most rows).
"SAME" tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).

Posterior&prior&likelihood

Posterior:
$$
p(\theta|x)
$$
Prior:
$$
p(\theta)
$$
likelihood:
$$
p(x|\theta)
$$
evidence:
$$
p(x)
$$
Bayes:
$$
p(\theta|x)=\frac{p(x|\theta)p(\theta)}{p(x)}
$$
Naive Bayes:

http://www.ruanyifeng.com/blog/2013/12/naive_bayes_classifier.html

假设某个体有n项特征（Feature），分别为F1、F2、…、Fn。现有m个类别（Category），分别为C1、C2、…、Cm。贝叶斯分类器就是计算出概率最大的那个分类，也就是求下面这个算式的最大值
$$
P(C|F1F2…Fn)=\frac{P(F1F2…Fn|C)P(C)}{P(F1F2…Fn)}
$$
由于 P(F1F2…Fn) 对于所有的类别都是相同的，可以省略，问题就变成了求
$$
P(F1F2…Fn|C)P(C)
$$
的最大值。

朴素贝叶斯分类器则是更进一步，假设所有特征都彼此独立，因此
$$
P(F1F2…Fn|C)P(C)=P(F1|C)P(F2|C)..P(Fn|C)P(C)
$$
虽然”所有特征彼此独立”这个假设，在现实中不太可能成立，但是它可以大大简化计算，而且有研究表明对分类结果的准确性影响不大。
$$
\begin{align}
P(C|F1F2…Fn)&=\frac{P(F1F2…Fn|C)P(C)}{P(F1F2…Fn)}\\
&=\frac{P(F1|C)P(F2|C)..P(Fn|C)P(C)}{P(F1)P(F2)..P(Fn)}
\end{align}
$$

KL divergence (KL 散度)

tf.summary

tf.pad

padding

Posterior&prior&likelihood