KL divergence (KL 散度)
good explanation:
- English: http://www.thushv.com/machine-learning/light-on-math-machine-learning-intuitive-guide-to-understanding-kl-divergence/
- Translation: https://zhuanlan.zhihu.com/p/37452654
tf.summary
https://medium.com/@lisulimowicz/tensorflow-summary-d4160304a6f1
tf.pad
https://blog.csdn.net/zhang_bei_qing/article/details/75090203
padding
‘same’ and ‘valid’: https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t
If you like ascii art:
- "VALID"= without padding:123inputs: 1 2 3 4 5 6 7 8 9 10 11 (12 13)|________________| dropped|_________________|
- "SAME"= with zero padding:12345pad| |padinputs: 0 |1 2 3 4 5 6 7 8 9 10 11 12 13|0 0|________________||_________________||________________|
In this example:
- Input width = 13
- Filter width = 6
- Stride = 5
Notes:
- "VALID"only ever drops the right-most columns (or bottom-most rows).
- "SAME"tries to pad evenly left and right, but if the amount of columns to be added is odd, it will add the extra column to the right, as is the case in this example (the same logic applies vertically: there may be an extra row of zeros at the bottom).
Posterior&prior&likelihood
Posterior:
$$
p(\theta|x)
$$
Prior:
$$
p(\theta)
$$
likelihood:
$$
p(x|\theta)
$$
evidence:
$$
p(x)
$$
Bayes:
$$
p(\theta|x)=\frac{p(x|\theta)p(\theta)}{p(x)}
$$
Naive Bayes:
http://www.ruanyifeng.com/blog/2013/12/naive_bayes_classifier.html
假设某个体有n项特征(Feature),分别为F1、F2、…、Fn。现有m个类别(Category),分别为C1、C2、…、Cm。贝叶斯分类器就是计算出概率最大的那个分类,也就是求下面这个算式的最大值
$$
P(C|F1F2…Fn)=\frac{P(F1F2…Fn|C)P(C)}{P(F1F2…Fn)}
$$
由于 P(F1F2…Fn) 对于所有的类别都是相同的,可以省略,问题就变成了求
$$
P(F1F2…Fn|C)P(C)
$$
的最大值。
朴素贝叶斯分类器则是更进一步,假设所有特征都彼此独立,因此
$$
P(F1F2…Fn|C)P(C)=P(F1|C)P(F2|C)..P(Fn|C)P(C)
$$
虽然”所有特征彼此独立”这个假设,在现实中不太可能成立,但是它可以大大简化计算,而且有研究表明对分类结果的准确性影响不大。
$$
\begin{align}
P(C|F1F2…Fn)&=\frac{P(F1F2…Fn|C)P(C)}{P(F1F2…Fn)}\\
&=\frac{P(F1|C)P(F2|C)..P(Fn|C)P(C)}{P(F1)P(F2)..P(Fn)}
\end{align}
$$