Introduction to Algorithms--QuickSort and its randomize

Divide and conquer
Sorts “in place”
Very practical (with tuning)

Divide and conquer

Divide (Key): Partition array into 2 subarrays around, pivot x, elements in the lower subarray $\leq$ x $\leq$ elements in the upper subarray
Conquer: Recursively sort 2 subarrays
Combine: Trivial

Time complexity: $O(n)$

Pseudocode

1 2	//initial call QUCIKSORT(A, 1, n)

QUICKSORT(A,p,r)
if p < r
	q = PARTITION(A, p, r)
	QUICKSORT(A, p, q-1)
	QUICKSORT(A, q+1, r)

PARTITION(A,p,r)
x = A[p]
i = p
for j = p+1 to q
	if A[j] <= x
		i = i+1
		exchange A[i] with A[j]
exchange A[p] with A[i]
return i

Analysis

Assume all input elements are distinct
In practice, there are better partitioning algorithms for when duplicate input elements may exist
T(n) = worst-case running time on an array of n elements

Worst-case of quicksort

Input sorted of reversed sorted
Partition around min or max element
One side of partition always has no elements
$$
\begin{equation}
\begin{split}
T(n) &= T(0) +T(n-1) + O(n) \\
&= O(1) + T(n-1) +O(n) \\
&= T(n-1) + O(n)\\
&= O(n^2) \\
\end{split}
\end{equation}
$$
arithmetic series 等差级数, like insertion sort

$O(c\sum_{k=1}^nk) = O((\frac{1+n}{2})n) = O(n^2)$

$T(n) = O(n) + O(n^2) = O(n^2)$

worst_case

Best-case analysis (intuition only)

If we’re really lucky, Partition splits the array evenly n/2: n/2
$$
\begin{equation}
\begin{split}
T(n) &= 2T(n/2) + O(n) \\
&= O(nlgn)
\end{split}
\end{equation}
$$
(same as merge sort)

Analysis of “almost-best” case (intuition only)

What if the split is always $\frac{1}{10}:\frac{9}{10}$ ?

$$
\begin{equation}
\begin{split}
T(n) &= T(\frac{1}{10}n) + T(\frac{9}{10}n) + O(n) \\
O(n) &\le cn\\
\end{split}
\end{equation}
$$

$$
cn\cdot log_{10} \leq T(n) \leq cn\cdot log_{\frac{9}{10}} + O(n)
\approx nlgn
$$

same as evenly

More intuition

Suppose we alternate lucky, unlucky, lucky, unlucky, …
$$
\begin{equation}
\begin{split}
L(n) &= 2U(\frac{n}{2}) + O(n)\quad lucky\\
U(n) &= L(n-1) + O(n) \quad unlucky
\end{split}
\end{equation}
$$
Solving:
$$
\begin{equation}
\begin{split}
L(n) &= 2(L(\frac{n}{2}-1) + O(\frac{n}{2}))+O(n)\
&= 2L(\frac{n}{2}-1) + O(n) \\
&= O(nlgn) \quad lucky
\end{split}
\end{equation}
$$
How can we make sure we are usually lucky?

Randomized quicksort

IDEA: Partition around a random element

Running time is independent of the input ordering
No assumptions need to be made about the input distribution
No specific input elicits the worst-case behavior
The worst-case in determined only by the output of a random-number generator

Analysis

Let $T(n)$ be the random variable for the running time of randomized quicksort on an input of size $n$, assuming random numbers are independent

For $k=0,1,…,n-1$, define the indicator random variable
$$
X_k =
\begin{cases}
1, \quad \text{for if partition generates a k: n-k-1 split}\\
0, \quad \text{otherwise}
\end{cases}
$$

$$
\begin{equation}
\begin{split}
E[X_k]&=0\cdot P{X_k=0} + 1\cdot P{X_k=1}\
&= P{X_k=1}\\
&= \frac{1}{n}\\
&\text{all splts are equally likely 所有划分都是相同概率, assuming elements are distinct}\\
&\text{each one has a }\frac{1}{n}\text{ chance of being picked as the pivot}\\
&\text{when you pick the pivot, that determines what is on the left and the right and so forth}
\end{split}
\end{equation}
$$

$$
\begin{equation}
\begin{split}
T_n &=
\begin{cases}
T(0)+T(n-1)+O(n)\quad \text{if 0:n-1 split}\\
T(1)+T(n-2)+O(n)\quad \text{if 1:n-2 split}\\
…\\
T(n-1)+T(0)+O(n)\quad \text{if n-1:0 split}
\end{cases}\\
&=\sum_{k=0}^{n-1}X_k(T(k)+T(n-k-1)+O(n))
\end{split}
\end{equation}
$$

Calculating expectation

Take expectations of both sides
$$
\begin{equation}
\begin{split}
E[T(n)]&=E[\sum_{k=0}^{n-1}X_k(T(k)+T(n-k-1)+O(n))]\
&= \sum_{k=0}^{n-1}E[X_k(T(k)+T(n-k-1)+O(n))]\text{ linearity of expectation}\\
&=\sum_{k=0}^{n-1}E[X_k]\cdot E[(T(k)+T(n-k-1)+O(n))]\text{ Independence of }X_k\text{from other random choices}\\
&=\frac{1}{n}\sum_{k=0}^{n-1}E[(T(k)]+\frac{1}{n}\sum_{k=0}^{n-1}E[T(n-k-1)]+\frac{1}{n}\sum_{k=0}^{n-1}O(n)\text{ summations have identical terms}\\
&=\frac{2}{n}\sum_{k=0}^{n-1}E[(T(k)]+O(n)
\end{split}
\end{equation}
$$
The $k=0,1$ terms can be absorbed in the $O(n)$

Prove:

$E[T(n)]\leq anlgn$ for constant $a \geq 0$

Choose $a$ large enough so that $anlgn$ dominates $E[T(n)]$ for sufficiently small $n\geq 2$

Use fact:

$\sum_{k=2}^{n-1}klgk \leq \frac{1}{2}n^2lgn-\frac{1}{8}n^2$

(exercise)

Substitution method:
$$
\begin{equation}
\begin{split}
E[T(n)]&\leq\frac{2}{n}\sum_{k=2}^{n-1}aklgk+O(n)\\
&\leq \frac{2a}{n}(\frac{1}{2}n^2lgn-\frac{1}{8}n^2)+O(n)\text{ use fact}\\
&=anlgn-(\frac{an}{4}-O(n))\text{ express as desired - residual}\\
&\leq anlgn\\
& \text{if a is chosen large enough so that }\frac{an}{4} \text{ dominates the }\geq O(n)
\end{split}
\end{equation}\\
$$

Quicksort in practice

Quicksort is a great general-purpose sorting algorithm
Quicksort is typically over twice as fast as merge sort
Quicksort can benefit substantially from code tuning
Quicksort behaves well even with caching and virtual memory