Some points in Tensorflow

Tensor

A tensor consists of a set of primitive values shaped into array of any number of dimensions.

Tensorflow Core WalkThrough

Building the computational graph – tf.Graph
Running the computational graph – tf.Session

Graph

contains two types of objects

Operations: the node of the graph
Tensors: the edges in the graph

Session

1 2	sess = tf.Session() sess.run()

Feeding

x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
z = x + y
print(sess.run(z, feed_dict={x: 3, y: 4.5}))
print(sess.run(z, feed_dict={x: [1, 3], y: [2, 4]}))

Layers

Creating layers

1
2
3

x = tf.placeholder(tf.float32, shape=[None, 3])
linear_model = tf.layers.Dense(units=1)
y = linear_model(x)

initialing layers

1 2	init = tf.global_variables_initializer() sess.run(init)

Layer Function shortcuts

x = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.layers.dense(x, units=1)
init = tf.global_variables_initializer()
sess.run(init)
print(sess.run(y, {x: [[1, 2, 3], [4, 5, 6]]}))

Training

Define the data

Define the model

fully_connected layer

tf.contrib.layers.fully_connected(
    inputs,
    num_outputs,
    activation_fn=tf.nn.relu
)

loss

mean_squared_error

1
2
3

loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
print(sess.run(loss))

Optimize

Optimizers – tf.train.Optimizer

GradientDescent

1 2	optimizer = tf.train.GradientDescentOptimizer(0.01) train = optimizer.minimize(loss)

1
2
3

for i in range(100):
  _, loss_value = sess.run((train, loss))
  print(loss_value)

Adam

1	train = tf.train.AdamOptimizer.minimize(loss)

Complete Program

x = tf.constant([[1], [2], [3], [4]], dtype=tf.float32)
y_true = tf.constant([[0], [-1], [-2], [-3]], dtype=tf.float32)
linear_model = tf.layers.Dense(units=1)
y_pred = linear_model(x)
loss = tf.losses.mean_squared_error(labels=y_true, predictions=y_pred)
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init)
for i in range(100):
  _, loss_value = sess.run((train, loss))
  print(loss_value)
print(sess.run(y_pred))
sess.close()

also could use with, don’t need to close

with tf.Session() as sess:
    with tf.device("/gpu:1"): #could set the device
        result = sess.run([product])
        print result

Batch

get_batch():

get_batch(): by self
sklearn: gen_batches()
shuffle_batch_x, shuffle_batch_y = tf.train.shuffle_batch([X_train, y_train], batch_size=Config.batch_size, capacity=10000,min_after_dequeue=5000, enqueue_many=True)

Variable

tf.Variable() & tf.get_variable()

使用tf.Variable()时，如果检测到命名冲突，系统会自己处理。使用tf.get_variable()时，系统不会处理冲突，而会报错

import tensorflow as tf
w_1 = tf.Variable(3,name="w_1")
w_2 = tf.Variable(1,name="w_1")
print w_1.name
print w_2.name
#输出
#w_1:0
#w_1_1:0

import tensorflow as tf
w_1 = tf.get_variable(name="w_1",initializer=1)
w_2 = tf.get_variable(name="w_1",initializer=2)
#错误信息
#ValueError: Variable w_1 already exists, disallowed. Did
#you mean to set reuse=True in VarScope?

当我们需要共享变量的时候，需要使用tf.get_variable()。在其他情况下，这两个的用法是一样的

variable_scope & name_scop

tf.variable_scope可以让变量有相同的命名，包括tf.get_variable得到的变量，还有tf.Variable的变量

tf.name_scope可以让变量有相同的命名，只是限于tf.Variable的变量

如果已经存在的变量没有设置为共享变量，TensorFlow 运行到第二个拥有相同名字的变量的时候，就会报错。为了解决这个问题，TensorFlow 提出了 tf.variable_scope 函数：它的主要作用是，在一个作用域 scope 内共享一些变量，简单来说就是给变量名前再加了个变量空间名

tf.multinomial

tf.multinomial(logits, num_samples, seed=None, name=None)
从multinomial分布中采样，样本个数是num_samples，每个样本被采样的概率由logits给出

参数：
logits: 2-D Tensor with shape [batch_size, num_classes]. Each slice [i, :] represents the unnormalized log probabilities for all classes.2维量，shape是 [batch_size, num_classes]，每一行都是关于种类的未归一化的对数概率
num_samples: 0-D. Number of independent samples to draw for each row slice.标量，表示采样的个数，更重要的是，它限制了返回张量中元素的范围{：0，1，2，…，num_samples-1 }

返回值：
The drawn samples of shape [batch_size, num_samples]，注意元素的取值范围取决于num_samples

import tensorflow as tf
samples = tf.multinomial(tf.log([[10., 10., 10.]]), 5)
with tf.Session() as sess:
	sess.run(samples)
# 运行结果：array([[2, 1, 2, 2, 0]])

tf.reshape tf.shape

# tensor 't' is [1, 2, 3, 4, 5, 6, 7, 8, 9]
# tensor 't' has shape [9]
reshape(t, [3, 3]) ==> [[1, 2, 3],
                        [4, 5, 6],
                        [7, 8, 9]]
# tensor 't' is [[[1, 1], [2, 2]],
#                [[3, 3], [4, 4]]]
# tensor 't' has shape [2, 2, 2]
reshape(t, [2, 4]) ==> [[1, 1, 2, 2],
                        [3, 3, 4, 4]]
# tensor 't' is [[[1, 1, 1],
#                 [2, 2, 2]],
#                [[3, 3, 3],
#                 [4, 4, 4]],
#                [[5, 5, 5],
#                 [6, 6, 6]]]
# tensor 't' has shape [3, 2, 3]
# pass '[-1]' to flatten 't'
reshape(t, [-1]) ==> [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6]
# -1 can also be used to infer the shape
# -1 is inferred to be 9:
reshape(t, [2, -1]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],
                         [4, 4, 4, 5, 5, 5, 6, 6, 6]]
# -1 is inferred to be 2:
reshape(t, [-1, 9]) ==> [[1, 1, 1, 2, 2, 2, 3, 3, 3],
                         [4, 4, 4, 5, 5, 5, 6, 6, 6]]
# -1 is inferred to be 3:
reshape(t, [ 2, -1, 3]) ==> [[[1, 1, 1],
                              [2, 2, 2],
                              [3, 3, 3]],
                             [[4, 4, 4],
                              [5, 5, 5],
                              [6, 6, 6]]]
# tensor 't' is [7]
# shape `[]` reshapes to a scalar
reshape(t, []) ==> 7

tf.nn.softmax_cross_entropy_with_logits() & tf.nn.sparse.softmax_cross_entropy_with_logits()

tf.nn.sparse_softmax_cross_entropy_with_logits(
    _sentinel=None,
    labels=None,
    logits=None,
    name=None
)

Having two different functions is a convenience, as they produce the same result.

The difference is simple:

For sparse_softmax_cross_entropy_with_logits, labels must have the shape [batch_size] and the dtype int32 or int64. Each label is an int in range [0, num_classes-1].
For softmax_cross_entropy_with_logits, labels must have the shape [batch_size, num_classes] and dtype float32 or float64.

Labels used in softmax_cross_entropy_with_logits are the one hot version of labels used in sparse_softmax_cross_entropy_with_logits.

Another tiny difference is that with sparse_softmax_cross_entropy_with_logits, you can give -1 as a label to have loss 0 on this label.

不错的整理from某乎：https://zhuanlan.zhihu.com/p/33560183

损失函数：https://sthsf.github.io/wiki/Algorithm/DeepLearning/Tensorflow%E5%AD%A6%E4%B9%A0%E7%AC%94%E8%AE%B0/Tensorflow%E5%9F%BA%E7%A1%80%E7%9F%A5%E8%AF%86---%E6%8D%9F%E5%A4%B1%E5%87%BD%E6%95%B0%E8%AF%A6%E8%A7%A3.html#tfnnlog_softmaxlogits-namenone

cross_entropy 交叉熵

softmax()

sigmoid()

tf.reduce_max()

see in https://medium.com/@aerinykim/tensorflow-101-what-does-it-mean-to-reduce-axis-9f39e5c6dea2

tf.reduce_max(x, 0) kills the 0-th dimension. So ‘3’ in (3,2,5) will be gone.

tf.get_collection()

A collection is nothing but a named set of values.

Every value is a node of the computational graph.

Every node has its name and the name is composed by the concatenation of scopes, / and values, like: preceding/scopes/in/that/way/value

get_collection, without scope allow fetching every value in the collection without applying any filter operation.

When the scope parameter is present, every element of the collection is filtered and its returned only if the name of the node starts with the specified scope.

see in https://stackoverflow.com/questions/44691406/how-to-understand-tf-get-collection-in-tensorflow

Return: The list of values in the collection with the given name, or an empty list if no value has been added to that collection. The list contains the values in the order under which they were collected.

Gradients Computation

Processing gradients before applying them.

Calling minimize() takes care of both computing the gradients and applying them to the variables. If you want to process the gradients before applying them you can instead use the optimizer in three steps:

Compute the gradients with compute_gradients().
Process the gradients as you wish.
Apply the processed gradients with apply_gradients().

Example:

# Create an optimizer.
opt = GradientDescentOptimizer(learning_rate=0.1)
# Compute the gradients for a list of variables.
grads_and_vars = opt.compute_gradients(loss, <list of variables>)
# grads_and_vars is a list of tuples (gradient, variable).  Do whatever you
# need to the 'gradient' part, for example cap them, etc.
capped_grads_and_vars = [(MyCapper(gv[0]), gv[1]) for gv in grads_and_vars]
# Ask the optimizer to apply the capped gradients.
opt.apply_gradients(capped_grads_and_vars)