Introduction to Reinforcement learning

Posted on 2018-08-04 | In Machine Learning

强化学习

什么是强化学习

介于监督学习（有label）和无监督学习的中间

没有监督信息，只有“奖赏”的反馈信号
反馈通常是延迟的
样本通常不符合独立同分布假设
智能体的动作会影响到其后续观察到的数据的分布

Example：

直升机的飞行控制
- +r 沿着既定路线飞行
- -r 偏离或坠毁
组合投资管理
- +r 盈利
- -r 亏损
电站的控制
- +r 稳定输出电力
- -r 超出安全预警
Atari游戏的控制
- +r 增加游戏分数
- -r 减小游戏分数

序列决策问题

Interview--Operating System

Posted on 2018-07-28 | In Interview

Interview--Computer Network

Posted on 2018-07-28 | In Interview

IMPORTANT POINT

1.计算机网络体系结构

定义：计算机网络的各层+其协议的集合

作用：定义该计算机网络所能完成的功能

Deep Learning-Summary

Posted on 2018-07-28 | In Machine Learning

Some important points Summary

Softmax VS Sogmoid

	Softmax	Sigmoid
公式	$\sigma(z)_j =\frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} $	$S(x)=\frac{1}{1+e^{-x}}$
本质	离散概率分布	非线性映射
任务	多分类	二分类
定义域	某个一维向量	单个数值
值域	[0,1]	(0,1)
结果之和	一定为 1	为某个正数

Sigmoid就是极端情况(类别数为2)下的Softmax。

Introduction to Algorithms--QuickSort and its randomize

Posted on 2018-07-26 | In Algorithms

Divide and conquer
Sorts “in place”
Very practical (with tuning)

Divide and conquer

Divide (Key): Partition array into 2 subarrays around, pivot x, elements in the lower subarray $\leq$ x $\leq$ elements in the upper subarray
Conquer: Recursively sort 2 subarrays
Combine: Trivial

Time complexity: $O(n)$

Mathjex的转义问题

Posted on 2018-07-26 | In Coding

Issue

下角标 _ 无效
matrix无效

Markdown本身的特殊符号与Latex中的符号会出现冲突

Introduction to Algorithms--Sort problems

Posted on 2018-07-07 | In Algorithms

(Pseudocode)

Insert Sorting

// from right to left
INSERTION(A)
for j=2 to A.length
	key = A[j]
	i=j-1
	while i>0 and A[i]>key
		A[i+1] = A[i]
		i = i-1
	A[i+1] = key

$O(n^2)$

Jobdu-27-简单计算器

Posted on 2018-06-26 | In Jobdu

Stack

Question

表达式求值

Jobdu-22-今年暑假不AC

Posted on 2018-06-25 | In Jobdu

Greedy

Question

Problem Description
“今年暑假不AC？”
“是的。”
“那你干什么呢？”
“看世界杯呀，笨蛋！”
“@#$%^&*%…”

确实如此，世界杯来了，球迷的节日也来了，估计很多ACMer也会抛开电脑，奔向电视了。
作为球迷，一定想看尽量多的完整的比赛，当然，作为新时代的好青年，你一定还会看一些其它的节目，比如新闻联播（永远不要忘记关心国家大事）、非常6+7、超级女生，以及王小丫的《开心辞典》等等，假设你已经知道了所有你喜欢看的电视节目的转播时间表，你会合理安排吗？（目标是能看尽量多的完整节目）

Input
输入数据包含多个测试实例，每个测试实例的第一行只有一个整数n(n<=100)，表示你喜欢看的节目的总数，然后是n行数据，每行包括两个数据Ti_s,Ti_e (1<=i<=n)，分别表示第i个节目的开始和结束时间，为了简化问题，每个时间都用一个正整数表示。n=0表示输入结束，不做处理。

Output
对于每个测试实例，输出能完整看到的电视节目的个数，每个测试实例的输出占一行。

Sample Input

Sample Output

Jobdu-21-FatMouse's house

Posted on 2018-06-25 | In Jobdu

Question

题目描述：
FatMouse prepared M pounds of cat food, ready to trade with the cats guarding the warehouse containing his favorite food, JavaBean.The warehouse has N rooms. The i-th room contains J[i] pounds of JavaBeans and requires F[i] pounds of cat food. FatMouse does not have to trade for all the JavaBeans in the room, instead, he may get J[i] a% pounds of JavaBeans if he pays F[i] a% pounds of cat food. Here a is a real number. Now he is assigning this homework to you: tell him the maximum amount of JavaBeans he can obtain.

输入：
The input consists of multiple test cases. Each test case begins with a line containing two non-negative integers M and N. Then N lines follow, each contains two non-negative integers J[i] and F[i] respectively. The last test case is followed by two -1’s. All integers are not greater than 1000.

输出：
For each test case, print in a single line a real number accurate up to 3 decimal places, which is the maximum amount of JavaBeans that FatMouse can obtain.

样例输入：
5 3
7 2
4 3
5 2
20 3
25 18
24 15
15 10
-1 -1

样例输出：
13.333
31.500