## Jesse's Software Engineering Blog

Feb 18

### Understanding Big O Algorithm Analysis

Algorithm analysis is a key component of successful software development. Often times we as developers get reliant on the languages that we use and allow their libraries to hide complexities that we should be taking the time to understand. Learning how native language functions are implemented and how well they perform is an essential part of becoming a good developer. In this article I am going to outline the benefits of algorithm analysis, focusing on big “Oh” asymptotic algorithm analysis, and make a case for taking the time to understand how algorithms and functions are analyzed.

NOTE: This is not a mathematical nor a scientific explanation of algorithm analysis. This is an overview to help explain the uses and benefits of performing algorithm analysis. Some of the terminology used may not be mathematically correct and a lot of the topics will be over simplified.

#### Defining Analysis

Let’s start by defining what an algorithm is:

An algorithm is a step-by-step procedure for performing some task in a finite amount of time

From that definition, almost every function that you write can be considered an algorithm. Functions are step by step procedures that are expected to run in a certain amount of time to accomplish a task.

What is the goal of algorithm analysis?

To determine efficient procedures to solve problems with large inputs

The goal of analysis is to determine a relationship between input size and execution time (or CPU cycle expenditure) for a given algorithm. In having such relationships we are able to do some valuable things:

1. Evaluate the relative efficiency of algorithms independently of hardware and software
2. Analyze algorithms without actually having to implement it
3. Benchmark algorithms while taking into account all possible inputs

#### Big “Oh” Asymptotic Algorithm Analysis

Big “Oh” notation is a very common method for evaluating the efficiency of various algorithms or functions. The formal definition is a bit more complex than that, but essentially the notation i.e. complexity, is used for comparing algorithms based on how they respond to an increase in input size. The term “asymptotic” refers to analysis as input size goes towards infinite i.e. how algorithms will handle an ever increasing data load.

When using big O analysis the focus is on the worst case. That is, for any given function there are a variety of different outcomes for how it would preform but we will always take the worst case scenario and use that in comparison with other functions. This make analysis simpler in that we don’t have to account for every single action inside of a function. We are only concerned with the actions that will have a significant impact on the performance of the function.

One thing to consider when using big O asymptotic comparison is that the smaller factors in an algorithm are ignored as we focus on the factors that have a dramatic affect on running time. In some circumstances this can lead to inaccuracies if there are a lot of constant, or minor factors, that are being omitted from analysis.

Also, just because an algorithm is asymptotically faster does not mean that it will always outperform. Some algorithms work better with smaller data sets but not as well with large sets and vise-versa. It is important to consider this when analyzing algorithms for your purposes.

#### Math Overview

As mentioned before, this is not a mathematical paper. There are no proofs or complex mathematical notations used to explain topics. However, there are some basic math concepts that you will need to understood in order to follow along.

Big O functions are written as:

```O(n)
n = size of data being passed into the function
```

When analyzing big O algorithms we generally talk about the rate at which time grows. When looking at the graphs don’t analyze them as concrete points i.e. time at a given input size, but rather if the time required for the algorithm to run is increasing or decreasing. This will become more apparent when looking at the graphs but the “steeper” the line on the graph, the “higher” the rate of change, and the “flatter” the line, the “lower” the rate of change.

#### Big O Functions

Algorithm analysis produces a relationship between data size and execution time. These functions are not only important to show the relationship between input size and time, but also to show the growth rate of time as a function of input size. Below I will outline the common functions used in algorithm analysis, as well as give a visual representation and code sample.

##### O(1) – Constant

A constant does not change, nor does a constant function. No matter how big the input size, n, the same amount of time will be required to execute the function. The graph of a constant function is a straight line: As the input size (x-axis) increases, the execution time (y-axis) stays the same. Consider the following algorithm:

```/**
* @param float
* @return float
*/
function calcSalesTax(\$price)
{
return \$price * .085;
}
```

Regardless of what is passed in for \$price, the same code will always be executed, therefore it is considered a constant function. Some other common constant code:

```\$a = 12;        // assignment
if (\$a < 12)    // comparison
\$a           // referencing an array element
count(\$a)       // getting array size
```

It is important to understand what operations in your language have constant complexity.

##### O(log(n)) - Logarithmic

Logarithmic complexity is characterized by a high rate of growth in the beginning but approaching linear growth as input size increases: Initially an increase in input size triggers a rapid increase in time, but as input size continues to grow the rate at which time grows begins to slow. For large inputs this is an ideal function in that large increases in input size does not equate to large increases in execution time.

An example of a logarithmic function is a binary search. The binary search function takes a sorted array and starts with a comparison in the middle of the array, removing the left or right side depending on the comparison result. The search continually halves the data until the value is found.

##### O(n) - Linear

Linear complexity is directly proportional to the size of input. As input size increases the rate of change in time remains constant: This occurs when you loop over an entire array. The size of the array has a proportional impact on the amount of time it will take to finish the function.

```/**
* @param array
* @return float
*/
function getCartTotal(\$prices)
{
\$total = 0;

foreach (\$prices as \$price) {
\$total += \$price;
}

return \$total;
}
```
##### O(nlog(n)) - N-Log-N

This is another variation of a logarithmic function. N-Log-N is taking log(n) as we had before and multiplying by n, the input size. The result is a little more rapid growth than linear: Comparing N-Log-N and linear: At low levels of input the nlog(n) algorithm will outperform a linear algorithm, however as the input size increases the growth rate of N-Log-N time will grow, making the algorithm less efficient. Sort algorithms such as the quick sort, merge sort, and heap sort, all have N-Log-N complexity.

##### O(n²) - Polynomial

Polynomial growth can be raised to any power: The difference between the exponents has a drastic affect on the rate at which time increases for the different functions. Comparing a O(n²) graph to a O(nlog(n)): At smaller input sizes the two algorithms will behave similarly but as size increases the polynomial function will quickly grow at a faster rate than the N-Log-N function.

Polynomial growth is cause by nested loops:

```/**
* @param  array
* @return bool
*/
function checkDupeItems(\$items)
{
\$size = count(\$items);

for (\$i=0; \$i<\$size; \$i++) {
for (\$j=0; \$j<\$size; \$j++) {
// dont compare the item to itself
if (\$j == \$i) {
continue;
}
// see if items match
if (\$items[\$i] === \$items[\$j]) {
return true;
}
}
}

// no duplicate item found
return false;
}
```

There's a chance this function will not execute for every item but in the worst case scenario the items array is iterated over twice, n * n = n².

##### O(2^n) - Exponential

Exponential algorithms typically have a base of 2 but it is important to note they can any base. The exponential graph compared to polynomial: Exponential algorithms are usually considered useless for anything but the smallest functions.

#### Complexity Comparisons

Now that we’ve seen a visual representation of the different big O functions, let’s take a look at some hypothetical numbers to help solidify the comparisons. The following table shows execution time as the input size increases:

n O(log(n)) O(n) O(nlog(n)) O(n²) O(2^n)
8 3 8 24 64 256
16 4 16 64 256 65,536
32 5 32 160 1024 4,294,967,296
64 6 64 384 4096 1.84 X 10^19
128 7 128 896 16,384 3.40 X 10^38
256 8 256 2048 65,536 1.15 X 10^77
512 9 512 4608 262,144 1.34 X 10^154

Notice how the big O complexities affect the execution time. Judging from the table, which big O complexity would you prefer for your functions? O(log(n)) obviously is the best choice here with execution time barely increasing with the larger data sets. Another common way to see big O comparisons:

Array List O(1) O(1) O(n) O(n)
Linked List O(n) O(1) O(1) O(n)

The table outlines the different complexities for different operations performed by lists. Constant complexity, O(1), means that the operation cost will not increase with data size, where as with linear complexity, O(n), operation cost will grow with the list size. So adding and retrieving elements from an array will have the same efficiency regardless of array size, but removing and finding values in an array will be affected by the overall size. From earlier graphs, you should be able to visualize the two growth functions, O(1) and O(n), and know how they relate to each other as the data set grows.

For a more comprehensive list of big O comparisons visit the Big O Cheat Sheet.

#### Practical Example

Here’s a simple example demonstrating how you can use big O analysis during development. We are developing an algorithm that will process items from a shopping cart. The function processItem() already exists but we will need to pass each individual item into the function to be processed. Which algorithm is the better choice?

Algorithm 1:

```function checkOut(\$cart)
{
for (\$i=0; \$i<count(\$cart); \$i++) {
processItem(\$cart[\$i]);
}
}
```

Algorithm 2:

```function checkOut(\$cart)
{
while (count(\$cart)) {
processItem(array_shift(\$cart));
}
}
```

The first algorithm loops through each item in the cart, in turn passing it along to be processed. This function would be considered O(n) complexity as the time required to complete the function is directly proportional to the array size.

The second algorithm does the same thing but takes advantage of the PHP function, array_shift(). Like the first algorithm the function iterates through each item in the cart giving a complexity of O(n). However, the array_shift() returns the first item in the array AND re-indexes the remaining items which requires an additional iteration through the array; therefore, array_shift() also has a complexity of O(n). The entire algorithm then has a complexity of O(n) * O(n) = O(n²). Comparing the two algorithms on a graph: As you can see the first algorithm will clearly perform better under a large data set. This example outlines the importance of knowing the complexities of your language's native functions. Just because functions are available does not mean they should always be used. It is important to understand the language you are using so that you can make educated decisions during algorithm development.

#### Going Forward

Algorithm analysis is very complex topic and hard to learn from an article, but hopefully I was able to help you understand the concepts behind the analysis and motivated you to continue your education. Having an understanding of the meaning behind the analysis and being able to visualize the relationships between the different complexity functions will be very beneficial to your development career. Try incorporating these concepts into your day to day development practices, and analyze code written by yourself and your peers, as well as the native functions of your language, to truly get the engineering perspective and experience required for an in-depth and well rounded knowledge base. I will write articles about calculating the Big O complexity of more advanced functions in the future.