- Divide and conquer
- Sorts “in place”
- Very practical (with tuning)
Divide and conquer
- Divide (Key): Partition array into 2 subarrays around, pivot x, elements in the lower subarray $\leq$ x $\leq$ elements in the upper subarray
- Conquer: Recursively sort 2 subarrays
- Combine: Trivial
Time complexity: $O(n)$
Pseudocode
|
|
|
|
|
|
Analysis
- Assume all input elements are distinct
- In practice, there are better partitioning algorithms for when duplicate input elements may exist
T(n)
= worst-case running time on an array of n elements
Worst-case of quicksort
Input sorted of reversed sorted
Partition around min or max element
One side of partition always has no elements
$$
\begin{equation}
\begin{split}
T(n) &= T(0) +T(n-1) + O(n) \\
&= O(1) + T(n-1) +O(n) \\
&= T(n-1) + O(n)\\
&= O(n^2) \\
\end{split}
\end{equation}
$$
arithmetic series 等差级数, like insertion sort$O(c\sum_{k=1}^nk) = O((\frac{1+n}{2})n) = O(n^2)$
$T(n) = O(n) + O(n^2) = O(n^2)$
Best-case analysis (intuition only)
If we’re really lucky, Partition splits the array evenly n/2: n/2
$$
\begin{equation}
\begin{split}
T(n) &= 2T(n/2) + O(n) \\
&= O(nlgn)
\end{split}
\end{equation}
$$
(same as merge sort)
Analysis of “almost-best” case (intuition only)
What if the split is always $\frac{1}{10}:\frac{9}{10}$ ?
$$
\begin{equation}
\begin{split}
T(n) &= T(\frac{1}{10}n) + T(\frac{9}{10}n) + O(n) \\
O(n) &\le cn\\
\end{split}
\end{equation}
$$
$$
cn\cdot log_{10} \leq T(n) \leq cn\cdot log_{\frac{9}{10}} + O(n)
\approx nlgn
$$
same as evenly
More intuition
Suppose we alternate lucky, unlucky, lucky, unlucky, …
$$
\begin{equation}
\begin{split}
L(n) &= 2U(\frac{n}{2}) + O(n)\quad lucky\\
U(n) &= L(n-1) + O(n) \quad unlucky
\end{split}
\end{equation}
$$
Solving:
$$
\begin{equation}
\begin{split}
L(n) &= 2(L(\frac{n}{2}-1) + O(\frac{n}{2}))+O(n)\
&= 2L(\frac{n}{2}-1) + O(n) \\
&= O(nlgn) \quad lucky
\end{split}
\end{equation}
$$
How can we make sure we are usually lucky?
Randomized quicksort
IDEA: Partition around a random element
- Running time is independent of the input ordering
- No assumptions need to be made about the input distribution
- No specific input elicits the worst-case behavior
- The worst-case in determined only by the output of a random-number generator
Analysis
Let $T(n)$ be the random variable for the running time of randomized quicksort on an input of size $n$, assuming random numbers are independent
For $k=0,1,…,n-1$, define the indicator random variable
$$
X_k =
\begin{cases}
1, \quad \text{for if partition generates a k: n-k-1 split}\\
0, \quad \text{otherwise}
\end{cases}
$$
$$
\begin{equation}
\begin{split}
E[X_k]&=0\cdot P{X_k=0} + 1\cdot P{X_k=1}\
&= P{X_k=1}\\
&= \frac{1}{n}\\
&\text{all splts are equally likely 所有划分都是相同概率, assuming elements are distinct}\\
&\text{each one has a }\frac{1}{n}\text{ chance of being picked as the pivot}\\
&\text{when you pick the pivot, that determines what is on the left and the right and so forth}
\end{split}
\end{equation}
$$
$$
\begin{equation}
\begin{split}
T_n &=
\begin{cases}
T(0)+T(n-1)+O(n)\quad \text{if 0:n-1 split}\\
T(1)+T(n-2)+O(n)\quad \text{if 1:n-2 split}\\
…\\
T(n-1)+T(0)+O(n)\quad \text{if n-1:0 split}
\end{cases}\\
&=\sum_{k=0}^{n-1}X_k(T(k)+T(n-k-1)+O(n))
\end{split}
\end{equation}
$$
Calculating expectation
Take expectations of both sides
$$
\begin{equation}
\begin{split}
E[T(n)]&=E[\sum_{k=0}^{n-1}X_k(T(k)+T(n-k-1)+O(n))]\
&= \sum_{k=0}^{n-1}E[X_k(T(k)+T(n-k-1)+O(n))]\text{ linearity of expectation}\\
&=\sum_{k=0}^{n-1}E[X_k]\cdot E[(T(k)+T(n-k-1)+O(n))]\text{ Independence of }X_k\text{from other random choices}\\
&=\frac{1}{n}\sum_{k=0}^{n-1}E[(T(k)]+\frac{1}{n}\sum_{k=0}^{n-1}E[T(n-k-1)]+\frac{1}{n}\sum_{k=0}^{n-1}O(n)\text{ summations have identical terms}\\
&=\frac{2}{n}\sum_{k=0}^{n-1}E[(T(k)]+O(n)
\end{split}
\end{equation}
$$
The $k=0,1$ terms can be absorbed in the $O(n)$
Prove:
$E[T(n)]\leq anlgn$ for constant $a \geq 0$
Choose $a$ large enough so that $anlgn$ dominates $E[T(n)]$ for sufficiently small $n\geq 2$
Use fact:
$\sum_{k=2}^{n-1}klgk \leq \frac{1}{2}n^2lgn-\frac{1}{8}n^2$
(exercise)
Substitution method:
$$
\begin{equation}
\begin{split}
E[T(n)]&\leq\frac{2}{n}\sum_{k=2}^{n-1}aklgk+O(n)\\
&\leq \frac{2a}{n}(\frac{1}{2}n^2lgn-\frac{1}{8}n^2)+O(n)\text{ use fact}\\
&=anlgn-(\frac{an}{4}-O(n))\text{ express as desired - residual}\\
&\leq anlgn\\
& \text{if a is chosen large enough so that }\frac{an}{4} \text{ dominates the }\geq O(n)
\end{split}
\end{equation}\\
$$
Quicksort in practice
- Quicksort is a great general-purpose sorting algorithm
- Quicksort is typically over twice as fast as merge sort
- Quicksort can benefit substantially from code tuning
- Quicksort behaves well even with caching and virtual memory