Grokking Algorithms

Grokking Algorithms (TURING)

Author: Aditya Bhargava

1.Binary_search

def binary_search(list, item):
    """
    list must be ordered
    time: O(logn)
    """
    low = 0
    high = len(list)-1
    while(low <= high):
        mid = (low + high)/2
        if list[mid] == item:
            return mid
        if list[mid] < item:
            low = mid+1
        else:
            high = mid-1
    return None
if __name__ == '__main__':
    print(binary_search([1,3,5,7,9],5))
    print(binary_search([1,3,4,7,8],9))

2.Selection_sort

def findSmallest(arr):
    """
    input: arr
    output: smallest_index
    """
    smallest = arr[0]
    smallest_index = 0
    for i in range(len(arr)):
        if arr[i] < smallest:
            smallest = arr[i]
            smallest_index = i
    return smallest_index
def selectionSort(arr):
    newArr = []
    for i in range(len(arr)):
        smallest_index = findSmallest(arr)
        newArr.append(arr.pop(smallest_index))
    return newArr
if __name__ == '__main__':
    print(selectionSort([5, 3, 6, 2, 10]))

3.Quick_sort

def quicksort(array):
    if(len(array) < 2):   # baseline
        return array
    else:
        pivot = array[0]
        less = [i for i in array[1:] if i < pivot]
        greater = [i for i in array[1:] if i > pivot]
        return quicksort(less) + [pivot] + quicksort(greater)
print(quicksort([10, 5, 2, 7]))
# O(nlogn)
# layers: O(logn)
# for every layer: O(n)

4.Breadth_first_search

用于图的查找算法

查找最短路径：

从A出发，有前往B的路径么？
从A出发，前往B的哪条路径最短？

在广度优先搜索中，搜索范围从起点开始逐渐向外延伸，即先检查一度关系，再检查二度关系

from collections import deque
graph = {}
graph["you"] = ["alice", "bob", "claire"]
graph["bob"] = ["anuj", "peggy"]
graph["alice"] = ["peggy"]
graph["claire"] = ["thom", "jonny"]
graph["anuj"] = []
graph["peggy"] = []
graph["thom"] = []
graph["jonny"] = []
def person_is_seller(name):
    # sample logic to judge if the person is a seller
    return name[-1] == 'm' 
def search(name):
    search_queue = deque()
    search_queue += graph[name]
    # This array is how you keep track of which people you've searched before.
    searched = []
    while search_queue:
        person = search_queue.popleft()
        # Only search this person if you haven't already searched them.
        if not person in searched:
            if person_is_seller(person):
                print person + " is a mango seller!"
                return True
            else:
                search_queue += graph[person]
                # Marks this person as searched
                searched.append(person)
    return False
search("you")
# Running time: O(V+E)
# O(E) 边数  沿着每条边
# O(V) vertice 顶点数 将每个元素添加到队列的时间为O(1) 所有的元素共需要O(V)

Points：

使用队列
散列表是无序的，添加键值对的顺序无关紧要
对于检查过的元素，务必不要再去检查，否则可能会导致无限循环

如果任务A依赖于任务B，在列表中任务A必须在B后面，被称为拓扑排序，创建一个有序列表

这种图被称为数，树是一种特殊的图，其中没有往后指的边

5.Dijkstra’s Algorithm

找出最快路径

Process：

可在最短时间内前往的节点
对于该节点的邻居，检查是否有前往他们的更短距离，如果有，就更新其开销
重复这个过程，直到对图中的每个节点都这样做过
计算最终路径

Dilkstra’s Algorithm只适用于有向无环图(directed acyclic graph) DAG

如果有负权边，就不能使用Dilkstra’s Algorithm，可以使用Bellman-Ford Algorithm

graph = {}
graph["start"] = {}
graph["start"]["a"] = 6
graph["start"]["b"] = 2
graph["a"] = {}
graph["a"]["fin"] = 1
graph["b"] = {}
graph["b"]["a"] = 3
graph["b"]["fin"] = 5
graph["fin"] = {}
infinity = float("inf")
costs = {}
costs["a"] = 6
costs["b"] = 2
costs["fin"] = infinity
parents = {}
parents["a"] = "start"
parents["b"] = "start"
parents["fin"] = None
processed = []
def find_lower_cost_node(costs):
    lower_cost = float("inf")
    lower_cost_node = None
    for node in costs:
        cost = costs[node]
        if cost < lower_cost and node not in processed:
            lower_cost = cost
            lower_cost_node = node
    return lower_cost_node
node = find_lower_cost_node(costs)
while node is not None:
    cost = costs[node]
    neighbors = graph[node]
    for n in neighbors.keys():
        new_cost = cost + neighbors[n]
        if costs[n] > new_cost:
            costs[n] = new_cost
            parents[n] = node
    processed.append(node)
    node = find_lower_cost_node(costs)
print(costs)
node = "fin"
while node is not "start":
    node = parents[node]
    print(node)

6.Greedy Algorithm

每步都采取最优的算法

每部都采取局部最优解，最终得到的就是全局最优解

近似算法：使用贪婪算法可以得到非常接近的解

在获得精确解需要的时间太长时，可使用近似算法

（1）集合覆盖问题（广播台）

states_needed = set(["mt", "wa", "or", "id", "nv", "ut", "ca", "az"])
stations = {}
stations["kone"] = set(["id", "nv", "ut"])
stations["ktwo"] = set(["wa", "id", "mt"])
stations["kthree"] = set(["or", "nv", "ca"])
stations["kfour"] = set(["nv", "ut"])
stations["kfive"] = set(["ca", "az"])
final_stations = set()
while states_needed:
    best_station = None
    states_covered = set()
    for station, states_for_station in stations.items():
        covered = states_for_station & states_needed
        if len(covered) > len(states_covered):
            states_covered = covered
            best_station = station
    states_needed -= states_covered
    final_stations.add(best_station)
print(final_stations)

（2）旅行商问题

旅行商前往5个不同的城市，找出前往这5个城市的最短路径

NP完全问题：

需要计算所有的解，并从中选出最小/短的

NP问题不用去找最完美的解，而是使用近似算法即可

没有办法完全判断是不是NP完全问题，但还是有蛛丝马迹可循

Points：

元素较少时算法的运行速度较快，但随着元素的数量的增加，速度会变得非常慢
涉及所有组合的问题，通常是NP问题
不能讲问题分成小问题，必须考虑各种可能的情况，可能是NP问题
如果问题涉及序列（如旅行商问题中的城市序列问题）且难以解决，可能是NP问题
如果问题涉及集合（如广播台集合）且难以解决，可能是NP问题
如果问题可以转换成旅行商问题或集合覆盖问题，那他就是NP完全问题

对于NP完全问题，没有找到快速解决的办法，最佳的做法是使用近似算法

states_needed = set(["mt", "wa", "or", "id", "nv", "ut", "ca", "az"])
stations = {}
stations["kone"] = set(["id", "nv", "ut"])
stations["ktwo"] = set(["wa", "id", "mt"])
stations["kthree"] = set(["or", "nv", "ca"])
stations["kfour"] = set(["nv", "ut"])
stations["kfive"] = set(["ca", "az"])
final_stations = set()
while states_needed:
    best_station = None
    states_covered = set()
    for station, states_for_station in stations.items():
        covered = states_for_station & states_needed
        if len(covered) > len(states_covered):
            states_covered = covered
            best_station = station
    states_needed -= states_covered
    final_stations.add(best_station)
print(final_stations)

7.Dynamic Programming

Keys:

最优子结构
边界
状态转移公式


// 有一座高度是10级台阶的楼梯，从下往上走，每跨一步只能
// 向上1级或者2级台阶。要求用程序来求出一共有多少种走法。
import java.util.HashMap;
public class Dynamic_programming{
    public int getClimbingWays_recursive(int n){
        if(n<1)
            return 0;
        if(n==1)
            return 1;
        if(n==2)
            return 2;
        return getClimbingWays_recursive(n-1)+getClimbingWays_recursive(n-2);
    }
    public int getClimbingWays_memo(int n, HashMap<Integer, Integer> map){
        if(n<1)
            return 0;
        if(n==1)
            return 1;
        if(n==2)
            return 2;
        if(map.containsKey(n))
            return map.get(n);
        else{
            int value=getClimbingWays_memo(n-1, map)+getClimbingWays_memo(n-2, map);
            map.put(n,value);
            return value;
        }
    }
    public int getClimbingWays_DP(int n){
        if(n<1)
            return 0;
        if(n==1)
            return 1;
        if(n==2)
            return 2;
        int a=1, b=2, temp=0;
        for(int i=3;i<=n;i++){
            temp=a+b;
            a=b;
            b=temp;
        }
        return temp;
    }
    
    public static void main(String args[]){
        Dynamic_programming test = new Dynamic_programming();
        HashMap<Integer, Integer> map = new HashMap<>();
        System.out.println(test.getClimbingWays_DP(3));
        System.out.println(test.getClimbingWays_memo(3,map));
        System.out.println(test.getClimbingWays_recursive(3));
    }
}

3 ways:

recursive
memo, for record
DP

The King and the Golden

10 workers + 5 Golden

Num	Golden	People
1	400	5
2	500	5
3	200	3
4	300	4
5	350	3

最优子结构：

5个金矿的最优选择：Max（前4座金矿10人的挖金数量，前4座金矿7工人的挖金数量+第5座金矿的挖金数量）

金矿数量设为N，工人数W，金矿的黄金量设为数组G[ ]，金矿的用工量设为数组P[ ]

F(5, 10) = Max( F(4, 10), F(4, 10-P[4])+G[4])

边界情况：

N=1, W>=P[0], F(N, W) = G[0]

N=1, W<P[0], F(N, W) = 0

Summary:

F(N, W) = 0 (N<=1, W<=P[0])
F(N, W) = G[0] (N==1, W>=P[0])
F(N, W) = F(N-1, W) (N>1, W<P[N-1])
F(N, W) = max(F(N-1,W), F(N-1, W-P[N-1])+G[N-1]) (N>1, W>=P[N-1])

N\W	3	4	5	6	7	8	9	10
1			400	400	400	400	400	400
2			500	500	500	500	500	900
3	200	200	500	500	500	700	700	900
4	200	300	500	500	500	700	800	900
5	200	350	500	550	650	850	850	900

除了第一行以外，每个格子都是前一行的一个或者两个格子推导而来的

并不需要存储整个表格，只需要存储前一行的结果，就可以推到出新的一行

8.Next

树
- 二叉树
  
  查找：O(logn)，插入：O(logn)，删除：O(logn)
- B树
- 红黑树
- 堆
- 伸展树
反向索引
傅里叶变换
并行算法
- Map，Reduce
布隆过滤器

概率型数据结构，提供的答案可能不对，但很可能是正确的

使用散列表时，答案绝对可靠，而使用布隆过滤器时，答案却很可能是正确的

可能出现报错的情况，不可能出现漏报的情况
- HyperLogLog 类似布隆过滤器的算法
面对海量数据且要求答案八九不离十时，可考虑使用概率性算法
SHA算法

安全散列函数（secure hash algorithm），生成一个散列值–一个较短的字符串

散列算法是单向的

局部敏感的散列函数：Simhash，需要检查两项内容相似程度时
Diffie-Hellman密钥交换

私钥、公钥

RSA