Akshay Padte

Understanding Grover Search

An attempt to explain Grover search in a self-contained article

  ·   15 min read

This post aims to explain Grover Search and the required quantum computing in a single self-contained article. It is not meant as an introduction to quantum computing, and hence only briefly goes over some of the concepts that can help a reader appreciate Grover’s search algorithm.

Introduction

You might be aware that classically, unordered search is linear in the number of inputs. Mathematicaly, if we say we have a function $f$ (and we cannot know the internal working of $f$ ), such that $f(x) = 0$ for all $x$ except a specific $x^{\star}$ where $f(x^{\star}) = 1$, then to find $x^{\star}$, we would have no option other than to try for each $x$ one-by-one. Even if we aim for a 50% success rate, we still would expect to query half of all possible values of $x$.

Grover search can do this in $O(\sqrt{N})$, hence demonstrating a quantum speedup. How is this possible??

Some background

The Qubit

The basic unit of information in the quantum sense is the qubit. Like a regular bit can be 0 or 1, a qubit can be 0, 1, or a complex linear combination of 0 and 1 (called a superposition).

Qubits are neatly represented using vectors, with “0” and “1” forming our (default) basis, where “0” is $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and “1” is $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$ We write “0” as $\ket{0}$ (pronounced as “ket 0”) and “1” as $\ket{1}$ (pronounced as “ket 1”) .This is called the Dirac notation, also called as bra-ket notation, used to highlight the fact that we are dealing with vectors.

In Dirac notation $\ket{\text{something}}$ (pronounced “ket something”) is a column vector, and $\bra{\text{something}}$ (pronounced “bra something”) is a row vector, which is the complex conjugate transpose of the ket form.
For example, if $\ket{\psi} = \begin{bmatrix} a + ib \\ c + id \end{bmatrix}$, then $\bra{\psi} = \begin{bmatrix} a - ib & c - id \end{bmatrix}$

By “complex linear combination”, we mean that just like a qubit could be $\ket{0}$ or $\ket{1}$, it could also be $\alpha\ket{0} + \beta\ket{1}$, where $\alpha$ and $\beta$ are complex numbers (eg $\alpha = 0.2 + 0.5i$).
$\alpha$ and $\beta$ are called the amplitudes for the basis states $\ket{0}$ and $\ket{1}$ respectively. In vector form, you could also write this as $\begin{bmatrix} \alpha \\ \beta \end{bmatrix}$

The $\ket{0}$ state is when $\alpha=1$ and $\beta=0$.
Similarly, $\ket{1}$ state is when $\alpha=0$ and $\beta=1$.
Also observe that $\ket{0}$ and $\ket{1}$ are orthogonal since their inner product is 0.

In the physical sense, a qubit could be something like a photon, and the state could be something like a property of that photon (for eg. spin)

Measurements

You may have also heard “observing collapses the superposition”. That means when we perform an observation of a qubit (also called a measurement), the superposition will instantly collapse to either $\ket{0}$ or $\ket{1}$. The probability of which basis state it chooses is determined by the amplitude, or rather the square of the magnitude of the amplitude. The probability of a superposition state $\ket{\psi_{1}} = a\ket{0} + b\ket{1}$ being $\ket{0}$ or $\ket{1}$ after measurement is $|a|^2$ and $|b|^2$ respectively. The measurement result has to be one of these two, so by the rule of total probability $|a|^2 + |b|^2 = 1$

Consider a qubit with state $\frac{1}{\sqrt{2}}(\ket{0} + \ket{1})$
On measurement, this will give $\ket{0}$ with 50% probability, and $\ket{1}$ with 50% probability.

Multiple qubits

When dealing with two or more qubits, we use tensor product to combine the state of independent qubits. In general, when we have qubit $\ket{\psi_{1}} = a\ket{0} + b\ket{1} = \begin{bmatrix} a \\ b \end{bmatrix}$, and another qubit $\ket{\psi_{2}} = p\ket{0} + q\ket{1} = \begin{bmatrix} p \\ q \end{bmatrix}$
Then the combined state is given by $\ket{\psi_{1}} \otimes \ket{\psi_{2}}$
$$ \begin{align*} = & (a\ket{0} + b\ket{1}) \otimes (p\ket{0} + q\ket{1})\\ = & ap\ket{0}\otimes\ket{0} + aq\ket{0} \otimes \ket{1} + bp\ket{1} \otimes \ket{0} + bq\ket{1} \otimes \ket{1}\\ = &\begin{bmatrix} ap \\ aq \\ bp \\ bq \end{bmatrix} \end{align*} $$

For just the sake of reducing verbosity, we may skip writing $\otimes$ between two basis states (i.e write $\ket{0} \otimes \ket{0}$ as $\ket{00}$). Thus the above form can be written as $ap\ket{00} + aq\ket{01} + bp\ket{10} + bq\ket{11}$.

Observe that when talking about 2 qubits, they are a linear combination of the states “00”, “01”, “10”, “11”, which should be pretty intuitive.

Entanglement

Notice how I said “independent qubits” in the section for combining the states. If two qubits are not independent (called as entanglement), their state can only be described together, and cannot be written as the tensor product two indepedent qubits. Consider the state $\frac{1}{\sqrt{2}}(\ket{01} + \ket{10})$
On measurement, this state gives $\ket{01}$ with 50% probability, and $\ket{10}$ with 50% probability. If we try to separate this state into individual qubits, by saying $ap = 0, aq = 1, bp = 1, bq = 0$ and then try to calculate $a$, $b$, $p$ and $q$, observe how it is impossible to assign values for them.
This is because as $aq = 1$, neither $a$ or $q$ can be 0. Futher as $bp = 1$, neither b or p can be 0.
This means none of the variables can be 0.
Yet, we also have $ap = 0$, which means either $a$ or $p$ has to be 0, which is not possible given the previous two statements.

Another example of an entangled state is $\frac{1}{\sqrt{2}}(\ket{00} + \ket{11})$
Entagled states cannot be described by their individual qubits, and can only be described together.

Unitary transformations

The only way we can operate on qubit(s) is to apply linear transformations. Remember that linear transformations can be represented using a matrix, say $M$. Here $M$ is also called as the operator. So we would write $M\ket{v} = \ket{w}$
For example $\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$ $\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$
Remember that in doing such a transformation say from $a\ket{0} + b\ket{1}$ to $c\ket{0} + d\ket{1}$, we must guarantee that $|c|^2 + |d|^2 = |a|^2 + |b|^2 = 1$, otherwise the probabilities would stop making sense. For this, $M$ needs to be unitary. These unitary operations are also called as quantum “gates” (anologous to how classical logic gates work on bits).
The example $\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$ $\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$ shown above is nothing but the NOT gate being applied to $\ket{0}$ to yield $\ket{1}$

Unitaries can be attached one after the other, and this is denoted by regular product. For example we are applying operator $M _1$ after operator $M_2$, the resultant combined operator is $M_1M_2$

Unitaries can combined for multiple qubits using tensor product. If operator $M_{1}$ acting on first qubit and operator $M_{2}$ acting on the second qubits, the resultant combined operator is $M_1 \otimes M_2$

Any computable function $f$ can be evaluated using using unitaries, and if needed, some auxillary qubits.

Hadamard Gate

A very commonly used quantum gate is the Hadamard gate $H$, which for a single qubit is given by
$H = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$

Notice that
$H\ket{0} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \end{bmatrix} = \frac{1}{\sqrt{2}}(\ket{0} + \ket{1})$.

As noted previously, the result is a uniform superpositions of $\ket{0}$ and $\ket{1}$, meaning both $\ket{0}$ and $\ket{1}$ have probability $ (\frac{1}{\sqrt{2}})^2 = \frac{1}{2}$ .

Now note, $H\ket{1} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 0 \\ 1 \end{bmatrix} = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ -1 \end{bmatrix} = \frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$.
Observe that is is also a uniform superposition of $\ket{0}$ and $\ket{1}$. But it’s different from the previous result in one crucial way, the sign before $\ket{1}$ is negative.
Now lets see what happens if we apply $H$ to $\frac{1}{\sqrt{2}}(\ket{0} + \ket{1})$

$$\begin{align*} &H(\frac{1}{\sqrt{2}}(\ket{0} + \ket{1})) \\ =& \frac{1}{\sqrt{2}}(H\ket{0} + H\ket{1})\\ =& \frac{1}{\sqrt{2}}[\frac{1}{\sqrt{2}}(\ket{0} + \ket{1}) + \frac{1}{\sqrt{2}}(\ket{0} - \ket{1})]\\ =& \frac{1}{\sqrt{2}}[\frac{2}{\sqrt{2}}\ket{0} + \frac{0}{\sqrt{2}}\ket{1}]\\ =& \ket{0} \end{align*}$$

Notice that since we had two $\ket{0}$ of the same sign, they added up together. And the two $\ket{1}$ of opposite signs cancelled out.

This is called interference. Specifically, the $\ket{0}$ interfered contructively because they had the same signs, and the $\ket{1}$ interfered destructively because they had opposite signs.

If we have 2 qubits, then applying the 2-qubit Hadamard $H \otimes H $ to $\ket{00}$ yields $\frac{1}{2}\ket{00} + \frac{1}{2}\ket{01} + \frac{1}{2}\ket{10} + \frac{1}{2}\ket{11}$. All basis states have equal probability $ (\frac{1}{2})^2 = \frac{1}{4}$ .

In general if we have $n$ qubits in state $\ket{00…0}$,
applying the $n$-qubit Hadamard $H\otimes H\otimes …. \otimes H = H^{\otimes N}$ yields
$\frac{1}{\sqrt{2^n}}(\ket{000…0} + \ket{000..1}) + … + \ket{111..0} + \ket{111..1}$
which is all $2^n$ basis states in equal probability $\frac{1}{2^n}$

Controlled-NOT (CNOT) Gate

Another common gate is the CNOT gate, given for two qubits by
$\begin{bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{bmatrix}$
Basically what it does is, if the first qubit (referred to as the control qubit for this gate) is $\ket{0}$, it keeps the second qubit as it is. If the first qubit is $\ket{1}$, it flips the second qubit.
So
$\text{CNOT}\ket{00} = \ket{00}$
$\text{CNOT}\ket{01} = \ket{01}$
$\text{CNOT}\ket{10} = \ket{11}$ (second qubit flipped from 0 to 1)
$\text{CNOT}\ket{11} = \ket{10}$ (second qubit flipped from 1 to 0)

Reflection Gates

Let’s now see the operator formed by $M = 2\ket{0}\bra{0} - I$
(ket something multipliled bra something is a column vector multiplied by a row vector and yields a matrix, and $I$ here is the identity matrix) $$ \begin{align*} M = 2\ket{0}\bra{0} - I &= 2\begin{bmatrix} 1 \\ 0 \end{bmatrix}\begin{bmatrix} 1 & 0 \end{bmatrix} - \begin{bmatrix}1 & 0\\ 0 & 1 \end{bmatrix}\\ &= 2\begin{bmatrix} 1 & 0\\ 0 & 0 \end{bmatrix} - \begin{bmatrix}1 & 0\\ 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 2 & 0\\ 0 & 0 \end{bmatrix} - \begin{bmatrix}1 & 0\\ 0 & 1 \end{bmatrix}\\ &= \begin{bmatrix} 1 & 0\\ 0 & -1 \end{bmatrix} \end{align*} $$ Now see $M\ket{0} = \begin{bmatrix} 1 & 0\\ 0 & -1 \end{bmatrix}\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \ket{0}$

And, $M\ket{1} = \begin{bmatrix} 1 & 0\\ 0 & -1 \end{bmatrix}\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ -1 \end{bmatrix} = -\ket{1}$

So $M(a\ket{0} + b\ket{1}) = a\ket{0} - b\ket{1}$.

Grover Step 1

Thus we observe that $M$ relects things about $\ket{0}$ by flipping the sign of the component perpendicular to $\ket{0}$.

In general for a state $\ket{v}$, the operator $2\ket{v}\bra{v} - I$ performs a relection about $\ket{v}$

A uniquely quantum trick

First notice that if we apply some unitary $F$ to a two qubit system in the state
$\ket{\psi} = \frac{1}{2}\ket{00} + \frac{1}{2}\ket{01} + \frac{1}{2}\ket{10} + \frac{1}{2}\ket{11}$ (state where all 4 possible combinations have 25% probability)

On applying $F$, we will have $\frac{1}{2}F\ket{00} + \frac{1}{2}F\ket{01} + \frac{1}{2}F\ket{10} + \frac{1}{2}F\ket{11}$ .

We have just applied function $F$ to all possible combinations of two qubits all at once!

In general, for an $n$-qubit system in this uniform state, $F$ gets applied to all $N = 2^{n}$ combinations all at once. This property clearly does not exist with classical computers (if you are thinking of parallel computing, remember that even there you must compute $F$ multiple times, just on multiple cores. What we are seeing above requires $F$ to be called just once!).

Alas, this is not enough! Just because you have all the results in your superposition does not mean you can use them meaningfully. You still need to get the answer out of the superpostion (i.e you must perform a measurement). The problem is, when you do a measurement, it gives you any of the components of that superposition at random. If you just wanted any random result, you could have just computed any one at random in the first place! This is not usefull…

But wouldn’t it be nice if somehow we could amplify the desired answers (say where f(x) = 1), and dampen the undesired answers (where f(x) = 0)? Doing so would make it more likely that an observation will yield the desired answer!

Grover’s algorithm

Description

Given $N = 2^n$ possible inputs (where $n$ is the number of qubits), and a function $f$ for which $f(x) = 0$ for all $x$ except some specific $x^\star$, find $x^\star$.

Grover uses cleverly designed reflection gates multiple times to make the good results interfere constructively, and the bad results interfere destructively.
At the heart of Grover’s algorthm are two reflection operators:

  1. The first is the “oracle operator” $O$ which is defined as follows: $$ \begin{align*} O\ket{x} = \ket{x} &\text{ , if } f(x) = 0\\ O\ket{x} = -\ket{x} &\text{ , if } f(x) = 1 \end{align*} $$ As it turns out, this can be created if we have oracular access to $f$ as follows. Say we have the state $\ket{x} \otimes \frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$, then lets produce $f(x)$ on an auxilliary line.

    Lets apply a $\text{CNOT}$ with $f(x)$ as control onto the $\frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$ state.
    If $f(x)$ was 0, it remains the same. Otherwise it changes to $\frac{1}{\sqrt{2}}(\ket{1} - \ket{0}) = -\frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$

    When talking about the whole system, this sign can be pulled out as $-\ket{x} \otimes \frac{1}{\sqrt{2}}(\ket{0} - \ket{1})$.
    And we can ignore the last qubit to have $-\ket{x}$ when $f(x)=1$

  2. The second is called the “Grover diffusion operator” $A$ which is defined as $A = 2\ket{\psi}\bra{\psi} - I$, where $\ket{\psi}$ is the uniform superposition of all $2^{n}$ basis states formed by $n$ qubits.

    That is $\ket{\psi} = \frac{1}{\sqrt{2^n}}(\ket{000…0} + \ket{000..1}) + … + \ket{111..0} + \ket{111..1}$

Combinining both one after the other gives us $G = AO$, where $G$ is one iteration of the Grover algorithm.

Here’s how the entire works:

  1. Initialize the state to $\ket{000…00}$
  2. Use $n$ Hadamard gates (one for each qubit) to put the state into the uniform superposition $\ket{\psi} = \frac{1}{\sqrt{2^n}}(\ket{000…0} + \ket{000..1}) + … + \ket{111..0} + \ket{111..1}$
  3. Repeat $O(\sqrt{2^n})$ times:
    a. Apply oracle operator $O$
    b. Apply Grover diffusion operator $A$
  4. Perform measurement

Remember that the number of possible inputs of $f$ is $N = 2^{n}$ (for e.g. for 3 qubits, you have 8 possible inputs from 000, 001, …, to 111). So the complexity of Grover is $O(\sqrt{2^n})$ = $O(\sqrt{N})$
A classical algorithm will need to check all possible inputs one by one (8 possible inputs for 3 bits), so the complexity of a classical approach is $O(N)$.

Toy example

Let’s do a small example run:
Suppose we have three qubits, so the number of possible inputs for $f$ are $2^{3}$ = 8.

  1. Suppose $x^{\star}$ was $\ket{101}$ (and we don’t know this yet)

  2. We use hadamards on $\ket{000}$ to get
    $\ket{\psi} = \frac{1}{2\sqrt{2}}(\ket{000} + \ket{001} + \ket{010} + \ket{011} + \ket{100} + \ket{101} + \ket{110} + \ket{110} + \ket{111})$

  3. When we apply apply operator O, we would have
    $\ket{\phi} = \frac{1}{2\sqrt{2}}(\ket{000} + \ket{001} + \ket{010} + \ket{011} + \ket{100} - \ket{101} + \ket{110} + \ket{110} + \ket{111})$

    Notice that the sign of the accepting input has changed! (although we still cannot observe the effect because the probabality is still the same… as probabilities use squares)

  4. To apply $A$, it would help to rewrite $\ket{\phi}$ using only 2 orthogonal components, one along $\ket{\psi}$ and one perpendicular to $\ket{\psi}$. The vectors $\ket{\psi}$ and $\ket{\phi}$ in vector form are: $$ \ket{\psi} = \frac{1}{2\sqrt{2}}\begin{bmatrix}1 \\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\end{bmatrix} \hspace{1em} \ket{\phi} =\frac{1}{2\sqrt{2}} \begin{bmatrix}1 \\ 1\\ 1\\ 1\\ -1\\ 1\\ 1\\ 1\end{bmatrix} $$

    Let $\ket{\psi^{\perp}}$ be a vector perpendicular to $\ket{\psi}$.

    We want to write $\ket{\phi}$ as $\alpha\ket{\psi} + \beta\ket{\psi^\perp}$
    We can do this as: $$ {\phi} =\frac{1}{2\sqrt{2}} \begin{bmatrix}1 \\ 1\\ 1\\ 1\\ -1\\ 1\\ 1\\ 1\end{bmatrix} = \frac{3}{8\sqrt{2}}\begin{bmatrix}1 \\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\end{bmatrix} + \frac{1}{8\sqrt{2}} \begin{bmatrix}1 \\ 1\\ 1\\ 1\\ -7\\ 1\\ 1\\ 1\end{bmatrix} $$

    Now let $\ket{\varphi} = A\ket{\phi}$ $$ = \frac{3}{8\sqrt{2}}\begin{bmatrix}1 \\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\\ 1\end{bmatrix} - \frac{1}{8\sqrt{2}} \begin{bmatrix}1 \\ 1\\ 1\\ 1\\ -7\\ 1\\ 1\\ 1\end{bmatrix} $$ ( the sign of the $\ket{\psi^\perp}$ component is flipped after applying reflection $A$) $$ =\frac{1}{8\sqrt{2}}\begin{bmatrix}2\\ 2\\ 2\\ 2\\ 10\\ 2\\ 2\\ 2\end{bmatrix} =\frac{1}{4\sqrt{2}}\begin{bmatrix}1\\ 1\\ 1\\ 1\\ 5\\ 1\\ 1\\ 1\end{bmatrix} $$

  5. This state on measurement gives the accepting result $\ket{101}$ with $(\frac{5}{4\sqrt{2}})^2 = \frac{25}{32} = 78$% probability!!!!

  6. We could repeat it a few more times to yield even higher success probability!

Geometric interpretation

Imagine a unit circle, and the X-axis being the rejecting state $\ket{\text{Bad}}$ (the state consisting of all $\ket{x}$ such that $f(x)$ = 0), and the Y-axis being the accepting state $\ket{\text{Good}}$ (the state consisting of all $\ket{x}$ such that $f(x) = 1$) (since there is no overlap between the good state and bad state, their inner-product will be 0, hence we say they are orthogonal).

Our first state (after hadamard) will be somewhere between $\ket{\text{Good}}$ and $\ket{\text{Bad}}$
Grover Step 1

On applying $O$ operator, it applies a reflection about $\ket{\text{Bad}}$. If the angle between $\ket{\psi}$ and $\ket{\text{Bad}}$ was $\theta$, the angle between the reflection $\ket{\phi}$ and $\ket{\text{Bad}}$ is $-\theta$. Grover Step 2

Now, applying $A$ will flip this new state $\ket{\phi}$ about the original uniform state $\ket{\psi}$. It should be clear that the angle between $\ket{\psi}$ and $\ket{\phi}$ is $2\theta$. Applying a reflection of $2\theta$ in the counter-clockwise direction, we end up with a new state $\ket{\varphi}$ which is closer to the $\ket{\text{Good}}$ state than before!! Grover Step 3

How many times to iterate?

Doing the above iteration many times, we may accidentally overshoot the $\ket{\text{Good}}$ state.

If we know how many accepting states there are, we can compute exactly how many times we must iterate (apply both $O$ and $A$) to maximize the sucess probability.

We know every iteration adds an angle of $2\theta$ our state. We know angle between the $\ket{\text{Good}}$ and $\ket{\text{Bad}}$ states is $\frac{\pi}{2} $. We just need to find $\theta$ now. If there is only one correct state, there are $2^n - 1$ wrong answers. That means each basis state in the bad state has an amplitude of $\frac{1}{\sqrt{2^n - 1}}$. Each basis state in $\ket{\psi}$ has amplitude $\frac{1}{\sqrt{2^n - 1}}$. The inner product $\cos{\theta}$ will be $$\frac{2^n - 1}{\sqrt{2^n-1}\sqrt{2^n}} = \frac{\sqrt{2^n-1}}{\sqrt{2^n}}$$

We further know the trigonometric identity $\cos^2{\theta} + \sin^2{\theta} = 1$

So $\sin{\theta} = \frac{1}{\sqrt{2^n}}$

For an interesting function $f$, the number of accepting results would be very small compared to all possible inputs (otherwise, we wouldnt do all this in the first place and just query randomly to give an accepting result with a reasonable enough probability!)
So $\theta$ would be small for an interesting $f$. And when $\theta$ is small, $\sin{\theta} \approx \theta$
So $\theta \approx \frac{1}{\sqrt{2^n}}$

As the number of possible inputs of $f$ is $N = 2^n$, we have $\theta \approx \frac{1}{\sqrt{N}}$.

Thus the number of iterations needed is $\frac{\frac{\pi}{2}}{2\theta} = \frac{\pi\sqrt{N}}{4} = O(\sqrt{N})$