Fast Exponentiation Techniques and Interesting Problems which use them

Prerequisites – Big O Notation

The previous blog post was generally rather math-oriented and proof-focused. However, this blog post is going to focus more on coding. I will try to keep the material as accessible as possible, but one really important concept moving forward is Big O notation from Computer Science, which we use to denote how efficient a program is.

I highly recommend you watch the video if you don’t know what this is, to at least get a rough idea of what we’re talking about. It’s really important because moving forward we are going to be caring a lot more about efficiency, since Competitive Programming is not just about finding the answer, but finding it quickly. 

For now, all you need to know is that if we plug in N into the program’s runtime and we’re somewhere around 10^8 at most, then your program will probably pass the time limit.  Computers run at roughly 10^9 primitive operations per second, and you usually only have 1 or 2 seconds for programming problems. Since we are performing higher level operations and there are unseen constant factors, we play it safe and set 10^8 as our safe zone. For example, an O(N^2) algorithm will pass the time limit if N=10^3 , but will definitely fail if N=10^7 .

With that said, let’s move on to this post’s topic!

Problems Discussed:


Previously, we needed to find A^B+1 \mod{10^9+7} . Thankfully, due to modular arithmetic, we know that

(a+b)\% c = ((a\%c)+(b\%c))\%c
(ab)\%c = ((a\%{c})(b\%c))\%c

where % is the modulo operator.

In other words, if I add or multiply two integers then take their mod, I will get the same result as if I took their mod first, then added or multiplied them, and then took their mod. For example, observe that (110+25)\%7=135\%7=2 , but if we took the mod at each step, we’d have (110\%7 + 25\%7)\% 7 = (5+4)\%7 = 9\%7 = 2 , which is the same answer. Similarly, doing (45\cdot 51)\%8=2295\%8=7 , while ((45\%8)(51\%8))\%8 = (5\cdot3)\%8= 15\%8 = 7 as well.

This is nice because it means we never have to deal with numbers greater than \text{MOD}^2 when working modulo some \text{MOD} . We just need to apply modulo at every intermediate step, and so every step along the way, we curb the numbers before they can get unreasonably big. If we have 10^9+7 , then our program never has to deal with integers larger than around 10^{18} , which is still within the bounds of a 64-bit data type, which is good.

Therefore we have the following pseudo-code:

#computes base to the power of exp, modulo MOD
def slow_mod_pow(base,exp,MOD):
    result = 1
    base = base%MOD
    for i in range(exp): #do this exp times
        result = (result*base)%MOD
    return result

This seems like a reasonable function – compute A^N with a simple for loop that runs N times. Due to this for loop, the function runs in O(N) .

However, in the problem from the previous post, we needed to answer T different test cases, where T could be as large as 10^5 , with the exponent being as large as 10^9 per test case. If we just called this function for each test case, the O(TN) program would definitely not pass the time limit with those constraints. Is there a faster way?

Fast Exponentiation – Recursive

Again, we need to find A^N \bmod{M} . Suppose N is even. With the Laws of Exponents, we can rewrite A^N into \left(A^2\right)^{\frac{N}{2}} . Our base was squared, but honestly we don’t really care too much about the size of the base. What we do care about is the size of the exponent. Solving \left(A^2\right)^{\frac{N}{2}} is equivalent to the original problem, except the exponent we need to deal with is HALF the size. That’s great!

For instance, let’s try to find 5^{32} \bmod{11} . We have that

5^{32} \equiv 25^{16} \equiv 3^{16} \mod{11}
3^{16} \equiv 9^{8} \mod{11}
9^8 \equiv 81^4 \equiv 4^4 \mod{11}
4^4 \equiv 16^2 \equiv 5^2 \mod{11}
5^2 \equiv 25^1 \equiv 3 \mod{11}

Therefore, the remainder when 5^{32} is divided by 11 should be 3. Indeed if you fully expand 5^{32} into a 23-digit number, then perform long division with 11 on that value, you would also get a remainder of 3. However, with our method, we never had to deal with any values greater than 121, and it only took us 5 steps, not 32, to compute!

Of course, that’s if we have a perfect power of 2. What if you have an odd number? Well, A^N = A^{N-1} A^1 , so we just need to solve for A^{N-1}\pmod{M} and multiply it by A . We can just call the “if it’s even” part of our function again since N-1 here would now be even. Lastly, we have our base case, which is that A^0 = 1 (if our base is 0, then we just immediately return 0 instead).

This gives us our great new algorithm,

def mod_pow(base,exp,MOD):
    if exp==0:
        return 1
    if exp%2==1:
        return (base*mod_pow(base,exp-1,MOD))%MOD
        return mod_pow((base*base)%MOD, exp/2, MOD)

And it’s only 7 lines! Amazing! That’s efficiency not just in runtime, but in coding time as well.

Now, to solve for its runtime, we ask the question: How many times can you divide N by 2 until you hit 1? Suppose you can do so k times. Well, if N\div 2 \div 2 \ldots \div 2 = N\div 2^k = 1 , then N = 2^k and k = \lg N .

What about the odd numbers? Well, you can only perform the -1 operation about as many times as you perform the \div 2 operation, so it should still always be around the order of O(\lg N) .

Isn’t that cool? To answer A^{10^9}\bmod{M} , you would only need around 30 operations, and to answer A^{10^{18}}\bmod{M} , you would only need around 60 operations! That’s a MASSIVE upgrade from our previous idea.

Fast Exponentiation – Iterative

This method also runs in O(\lg N) , but I find the former recursive method to be a lot easier to memorize and code, so this one is not strictly necessary to know. However, it’s also rather neat and I wanted to include it anyway because why not.

One way we could answer the query without a recursive function is by first pregenerating A^n \bmod{M} for n which are powers of 2. In an array, we would store A^{2^0} , A^{2^1} , A^{2^2} , A^{2^3} , A^{2^4} \ldots until we hit the upper bound of our problem. Given A^{2^k} , we can easily generate the next term in the sequence by squaring the previous one: \left(A^{2^k}\right)^2 = A^{2^k\cdot2} = A^{2^{k+1}} . Since making each element take O(1) time and there are going to be O(\lg N) elements in the array, it should be clear then that this entire array can be pregenerated in O(\lg N) time.

How do we use this to do fast exponentiation? Well, if we want to get A^N , what we could do is look at the binary representation of N . Then, we can decompose A^N into a product of precomputed values from our table of powers of 2.

This really is best explained through an example, I feel. Say for instance you wanted to get A^{150} \bmod{M} . Well, we would decompose 150 as a sum of powers of 2, to get,

A^{150} \equiv A^{128+16+4+2} \mod{M}
\equiv A^{128}A^{16}A^4 A^2 \mod{M}
\equiv A^{2^7}A^{2^4}A^{2^2} A^{2^1} \mod{M}

Each of the factors in the final product had been precomputed earlier and can be accessed from the array in O(1) , and a number only has about \lg{N} bits in its binary representation, so the method still runs in O(\lg{N}) time.

This idea can also be written in another form, without the pregenerated array, which is the one presented in Wikipedia’s article on Modular Exponentiation,

def mod_pow(base, exp, MOD):
    base %= MOD
    ans = 1
    while exp > 0:
        if exp%2==1:
            ans = (ans*base)%MOD
        base = (base*base)%MOD
        exp //= 2 #or exp >>= 1
    return ans

I’ll leave it to you to parse how this function is the same as the other one I outlined in this iterative section 🙂

Note that in Python, the built-in pow function can take in three arguments, in which case the third one is the modulus M , and pow(A,B,M) would return A^B \bmod{M} . This pow function is very fast and also runs in O(\lg{N}) time. In other languages though, like C++, you’ll have to implement it on your own. This shouldn’t be too much of a problem, anyway, since it’s only around 7 lines in its recursive form.

Also note that the modulo part technically isn’t a necessary part of fast exponentiation, it’s just that often times exponential growth scales so quickly that without the modulo, you wouldn’t even be able to store something like (10^9)^{10^9} in your computer in the first place!

Programming Problem 1 – Tetrahedron

Let’s look at a rather straightforward application of mod_pow with my favorite branch of mathematics—combinatorics! This problem comes from Codeforces Round #113, which was rated for Div 2 contestants. Suppose an ant is on vertex D of a tetrahedron whose vertices are labeled A, B, C, and D. A single step takes the ant from its current vertex to one of the adjacent ones; it CANNOT stay on the same vertex. How many ways can the ant takes exactly n steps so that on the n th step, it returns to D? Output the answer modulo 10^9+7 .

Combinatorics problems are probably where you can expect to see a lot of fast exponentiation, thanks to the good old sum and product rule. Notice that at every step, the ant has exactly 3 choices for which vertex to go to next. It makes n-1 such choices, then its final n th choice is it picking vertex D. Therefore the answer should be 3^{n-1} . Except… there’s a complication. The (n-1) th choice can’t have been a D; if it were, then we wouldn’t be able to make our final choice be D. So, we have to subtract all the ways for the (n-1) th choice to be D.

How many ways can we land on D on the (n-1) th step? Well, with similar logic to earlier, there are (n-2) choices of 3 vertices, then the final (n-1) th choice will be D. So, it’s 3^{n-2} … except also just like earlier, we also have to exclude all the ways for the (n-2) th choice to be D. This would be 3^{n-3} , excluding all the ways for the (n-3) th choice to be D; this is 3^{n-4} , excluding all the ways for the (n-4) th choice to be D, and so on and so forth. It’s kind of like inclusion-exclusion.

Finally, we exclude all the ways to return to D in 2 steps, which is 3, excluding all the ways to return to D in 1 step, which is 0. We ultimately get this summation with alternating signs,

3^{n-1} - 3^{n-2} + 3^{n-3} - \ldots \pm 3^1

If we called mod_pow at every step, our runtime would be O(n \lg{n}) , which is teetering towards the land of “unsafe” since n could be 10^7 . Almost definitely the “Time Limit Exceeded” side of unsafe. Thankfully, we should know an easy way to shave off that entire factor of n . We can use the formula for a geometric series!

Here, the first element is 3^{n-1} , the common ratio is \frac{-1}{3} , and there are n-1 terms in total. Which means we get for our formula,

3^{n-1} \frac{ \left( \frac{-1}{3} \right)^{n-1} - 1 }{ \frac{-1}{3} - 1 } = 3^{n-1} \frac{ \frac{(-1)^{n-1}}{3^{n-1}} - 1 }{ \frac{-4}{3} }
= 3^{n-1} (1 - \frac{(-1)^{n-1}}{3^{n-1}}) \frac{3}{4}
= \frac{3^n - 3(-1)^{n-1}}{4}
= \frac{3^n + 3(-1)^n}{4}

Awesome! That’s a formula we can evaluate with a single O(\lg{N}) mod_pow call. Implement that and you should be able to get an Accepted for this problem 🙂

But how do I divide by 4?

The next concern is how to divide by 4, and it should be clear why you can’t just naively divide when doing modulo every step. We have that (24\div 3)\%5 = 8\%5 = 3 , except if we applied mod at every step, we’d get ((24\%5)\div(3\%5))\%5 = (4\div3)\%5 which… isn’t even an integer. Okay then.

We don’t really want to divide per se, though. What we want is to find the multiplicative inverse of 4. The multiplicative inverse of a , denoted by a^{-1} , is the element such that a\cdot a^{-1} = a^{-1}a = 1 , where 1 is the identity element. That’s what division really is, after all – multiplying by the multiplicative inverse.

Well, great, but we don’t have an idea of rational numbers in the modulo world; in a modulo ring, we only have integers. Well, that’s fine, it never had to be a fraction. To divide by b \pmod{p} , we just need to find some value x such that bx\equiv 1 \mod{M} . Note that this value will only exist if gcd(b,M)=1 (why?) Then, to do a\div b \mod{M}, we do ab^{-1} \mod{M} .

In tackling the next problem, we’ll talk about how to find the multiplicative inverse for an arbitrary input. Here, though, we only need to worry about 4, so we can hard code its value. Since our modulus, 10^9+7 , is 1 less than a multiple of 4, we can easily see that 4^{-1} \equiv 250000002 \mod{10^9+7} . Multiply the right hand side by 4 and you’ll see that we get 1.

Programming Problem 2 – Number of zero-xor subsets

This problem was taken from the 10th Ad Infinitum contest on Hackerrank.

Consider a set S that contains all the integers from 0 to 2^N-1 , where N is a positive integer and can be up to 10^{18} . How many subsets of S satisfy the property that the XORsum of the subset is equal to 0? Define the XORsum of a set to be the result after XORing all the elements in the set. Note that the XOR of the empty set is 0. Since the answer can be quite large, output the answer modulo 10^9+7 .

If you’re put off by the XOR operator and the ensuing combinatorics, just skip ahead to once we’ve already derived the formula and are figuring out how to implement it in code. I chose this problem because it shines a light on some other uses and techniques of fast exponentiation. If you just want to take the formula as a given, skip to Big Exponents to see how we would code it.

The bitwise-XOR is a logic-gate that accepts two bits as inputs. It returns 0 if the bits are the same, and 1 if the bits are different. We can extend this to deal with integers instead of just bits; express the integers in binary, then XOR the bits that are in the same “place value”. This operation is usually written as a\oplus b in math, but in most programming languages is denoted by the ^ operator. For example, 28\oplus 25 = (11100_2)\oplus (11001_2) = (00101_2) = 5 .

Understanding the following properties about the XOR will probably help.

Lemma 4.1 a \oplus a = 0

Proof. Since a is being XORed with itself, every pair of bits is always going to match. Therefore, the XOR of every pair of bits gives 0, so the answer is 0.

Lemma 4.2 a \oplus 0 = a

Proof. This directly follows from Lemma 4.1. You could also just prove it directly by again observing the truth table. Therefore, 0 is the XOR identity element. Also, it means the XOR is invertible, and each integer is in fact its own inverse.

Lemma 4.3 XOR is associative.
Lemma 4.4 XOR is commutative.

Proof. Look at the truth table of the bitwise-XOR and notice that it is associative and commutative. The XOR of two numbers is, again, just doing bitwise-XOR on each pair of their bits, and thus it’s also associative and commutative.

Now, let S_M be the set of all subsets of S such that their XORsum is is M ; we are looking for the size of S_0 . If you try filtering the subsets of S for some small value of N , say N=2 , by their XORsum, you’ll notice something peculiar. There are 4 elements in S_0 , 4 elements in S_1 , 4 elements in S_2 , and 4 elements in S_3 . Note that

S_0 = {{},{00},{01,10,11}, {00,01,10,11}
S_1 = {{01},{00,01},{10,11},{00,10,11}}
S_2 = {{10},{00,10},{01,11},{00,01,11}}
S_3 = {{11},{00,11},{01,10},{00,01,10}}

We now have a hypothesis that we can prove: for any N , the sizes of all S_k for 0 \leq k < 2^N will be equal!

Theorem 4.1 For any a and b which are valid XORsums of a subset of S , |S_a|=|S_b| .

There exists a bijection between any two sets S_a and S_b for 0 \leq a, b < 2^N . Let c = a\oplus b . For each element of S_a , apply the following transformation: if c is not in the set, insert it; otherwise, delete it. This transforms each element of S_a into an element of S_b . Why? "Toggling the presence of c " is actually the same as XORing c with the XORsum. If c is not in the set, then inserting it means the XORsum should be updated to include it; if c is already in the set, then by Lemmas 4.1, 4.2, 4.3, and 4.4, removing it is the same as XORing c to the current XORsum (since c is its own inverse).

Therefore every set in S_a , which has an XORsum of a , will be transformed into sets whose XORsum is a\oplus c = a\oplus (a\oplus b) = b . Therefore, the |S_a| unique elements of S_a each correspond to a unique element of S_b . We can therefore conclude that |S_a| \leq |S_b| .

However we could just do the exact same thing to transform elements of S_b into S_a . This gives us the conclusion that |S_b| \leq |S_a| . In order for both to be true, we end up with the fact that |S_a| = |S_b| . But a and b are just some arbitrary integers from 0 to 2^N-1 , therefore all S_k should be equal in size! Since we have 2^{2^N} elements in the power set of S , divided amongst the 2^N different S_k , the result is that

S_0 = S_k = \frac{2^{2^N}}{2^N} = 2^{2^N-N}

Big Exponents

Now let’s actually go about figuring out how to calculate this modulo 10^9+7 . If we just use our mod_pow directly, we’re going to face a problem. Our mod_pow runs in O(\lg{N}) , but \lg{2^{2^N}}= 2^N \lg{2} , and with N up to 10^{18} , that is most definitely not going to pass any reasonable time limit. We can’t even store 2^{10^{18}} as an integer!

You might be tempted to just apply MOD to the exponent, but that’s not exactly going to work. You will find, for instance, that 2^{12} \% 9 = 1 , but 2^{12\%9}\% 9 = 2^3 \%9 = 8 . They’re not equal. What now?

Thankfully, the modulus in our problem is 10^9+7 , which is a prime number. There is a well-known theorem in number theory called Fermat’s Little Theorem, which states that for some prime p and some a such that p \nmid a ,

a^{p-1} \equiv 1 \mod{p}

I won’t cover the proof here because there are a lot of great resources for it already online, or in any number theory text. What this means for us is, consider a^n \mod{p} for some n \geq p , then

a^n \equiv a^{n-(p-1)+(p-1)} \equiv a^{n-(p-1)} a^{p-1} \equiv a^{n-(p-1)} 1 \equiv a^{n-(p-1)} \mod{p}

Which gives us

Lemma 5.1 a^n \equiv a^{n-(p-1)} \mod{p}

And, invoking this many times, we get the fact that

Theorem 5.1 a^{n} \equiv a^{n\%(p-1)} \mod{p}

Which is exactly what we need for this problem! Many problems in competitive programming will ask you to find the answer mod a prime like 10^6+3 or 10^9+7 exactly so that theorems like this can be applied. To compute for 2^{2^N} \bmod{p} , we just need 2^{2^N \bmod{p-1}} \bmod{p} , or a call of


where for this problem, p=10^9+7 .

Note that this trick only works if p is a prime. For non-prime moduli, you would need to use Euler’s Theorem (the one with the Totient Function), which I’ll cover in a later blog post.

Dividing with Modulo

Earlier, we just hard-coded the multiplicative inverse of 4 mod 10^9+7 , but now we need to be able to find a way to do it for arbitrary 2^N . Thankfully, since p is prime in our problem, we can easily use Fermat’s Little Theorem again. We have

Theorem 5.2 a^{p-2} \equiv a^{-1} \mod{p}

Proof. Easy to see where it comes from, right? Multiply both sides by a and you just end up with Fermat’s Little Theorem again.

To convince yourself, try doing 10\div5 \mod{p} for some prime p\neq 5 . See the result when you do 10*mod_pow(5,p-2,p)%p. If you pick something large like p=10^9+7 , you’ll get a super large number for 5^{-1} , but when you multiply it with 10\mod{p} , you’ll end up with 2! Finding the modular inverse with this method works in O(\lg{p}) since all we’re doing is mod_pow.

Again, p has to be prime for this to work. For general non-prime moduli m , you have to fall back to Euler’s Theorem again, just like earlier. We could also use the Extended Euclidean Algorithm, finding a solution to ax+my=1 , but that’s a real hassle to code, vs the ~7 lines of mod_pow.

We’ve solved the problem! The answer is just


Dealing with Negatives and Modulo

You might notice for this problem that, hey instead of dividing by 2^N , what if we just compute 2^{2^N-N} \mod{p} ? Well, why not! We could have just done that too! There’s only one big caveat, for C++ anyway, and that’s to be careful whenever subtraction and negatives in general are involved in your modulo code.

What is (32 - 15)\%6 ? Well, that’s (32-15)\%6 = 17\%6 = 5 . But, in C++, if we took the modulo at every step like we normally do, we get (32\%6 - 15\%6)\%6 = (2-3)\%6 = (-1)\%6 = -1 . Which is… technically correct, since -1 \equiv 5 \mod{6} , but not really the answer we wanted. The modulo operator a%b , in C++ at least, returns an integer r in the range 0 \leq r < b if a is positive, and an r in the range -b < r \leq 0 if a is negative. You just need to account for that at the very end whenever negatives come into your code. I usually put an (x+MOD)%MOD at the end to make the result positive.

In Python 3, to my knowledge, the modulo operator % always returns an integer r in 0 \leq r < b whether the a was positive or negative. Just check your language's documentation, or test it out for yourself, before you get a Wrong Answer!

I used to have UVa's Contemplation Algebra here, but as of December 28, 2019, I have decided to move it to the Fast Matrix Exponentiation post.

Final Thoughts

Fast exponentiation is an incredibly powerful technique which I definitely consider one of the fundamentals for anyone specializing in math for competitive programming.

Next blog post is a Part 2 which fills in the gaps from these past two posts – how to efficiently find Fibonacci numbers, and an alternative solution to the Tetrahedron problem. ‘Til next time, enjoy doing more math and programming!


Published by


Math-enthusiast and aspiring competitive programmer. Undergrad.

One thought on “Fast Exponentiation Techniques and Interesting Problems which use them”

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s