Główna zawartość

## Podstawy informatyki - program rozszerzony

### Rozdział 4: lekcja 2

Ocena złożoności algorytmów# Pomiar wydajności algorytmu

A good algorithm is correct, but a great algorithm is both correct and

**efficient**. The most efficient algorithm is one that takes the least amount of execution time and memory usage possible while still yielding a correct answer.### Counting the operations

One way to measure the efficiency of an algorithm is to count how many operations it needs in order to find the answer across different input sizes.

Let's start by measuring the

**linear search**algorithm, which finds a value in a list. The algorithm looks through each item in the list, checking each one to see if it equals the target value. If it finds the value, it immediately returns the index. If it never finds the value after checking every list item, it returns -1.Here's what that algorithm looks like in pseudocode:

```
PROCEDURE searchList(numbers, targetNumber) {
index ← 1
REPEAT UNTIL (index > LENGTH(numbers)) {
IF (numbers[index] = targetNumber) {
RETURN index
}
index ← index + 1
}
RETURN -1
}
```

Note that this pseudocode assumes a 1-based index for lists.

Let's step through the algorithm for this sample input:

`searchList([3, 37, 45, 57, 93, 120], 45)`

The algorithm starts off by initializing the variable

`index`

to 1. It then begins to loop.Iteration #1:

- It checks if
`index`

is greater than`LENGTH(numbers)`

. Since 1 is*not*greater than 6, it executes the code inside the loop. - It compares
`numbers[index]`

to`targetNumber`

. Since 3 is*not*equal to 45, it does*not*execute the code inside the conditional. - It increments
`index`

by 1, so it now stores 2.

Iteration #2:

- It checks if
`index`

is greater than`LENGTH(numbers)`

. Since 2 is*not*greater than 6, it executes the code inside the loop. - It compares
`numbers[index]`

to`targetNumber`

. Since 37 is*not*equal to 45, it does*not*execute the code inside the conditional. - It increments
`index`

by 1, so it now stores 3.

Iteration #3:

- It checks if
`index`

is greater than`LENGTH(numbers)`

. Since 3 is*not*greater than 6, it executes the code inside the loop. - It compares
`numbers[index]`

to`targetNumber`

. Since 45*is*equal to 45, it executes the code inside the conditional. - It returns the current
`index`

, 3.

Now let's count the number of operations that the linear search algorithm needed to find that value, by counting how often each type of operation was called.

operation | # of times |
---|---|

Initialize index to 1 | 1 |

Check if `index` is greater than `LENGTH(numbers)` | 3 |

Check if `numbers[index]` equals `targetNumber` | 3 |

Return `index` | 1 |

Increment `index` by 1 | 2 |

That's a total of 10 operations to find the

`targetNumber`

at the index of 3. Notice the connection to the number 3? The loop repeated 3 times, and each time, it executed 3 operations. The

**best case**for an algorithm is the situation which requires the least number of operations. According to that table, the best case for linear search is when`targetNumber`

is the very first item in the list. What about the

**worst case**? According to that table, the worst case is when`targetNumber`

is the very last item in the list, since that requires repeating the 3 loop operations for every item in the list.There is actually a slightly worse worst case: the situation where

`targetNumber`

is not in the list at all. That'll require a couple extra operations, to check the loop condition and return -1. It doesn't require an entire extra loop repetition, however, so it's not that much worse. Depending on our use case for linear search, the worst case may come up very often or almost never at all.The

**average case**is when`targetNumber`

is in the middle of the list. That's the example we started with, and that required 3 repetitions of the loop for the list of 6 items. Let's describe the efficiency in more general terms, for a list of n items. The average case requires 1 operation for variable initialization, then n, slash, 2 loop repetitions, with 3 operations per loop. That's 1, plus, 3, left parenthesis, n, slash, 2, right parenthesis.

This table shows the number of operations for increasing list sizes:

list size | # of operations |
---|---|

6 | 10 |

60 | 91 |

600 | 901 |

Let's see that as a graph:

Generally, we can say that there's a "linear" relationship between the number of items in the list and the number of operations required by the algorithm. As the number of items increase, the number of operations increases in proportion.

### Empirical measurements

The number of operations does

*not*tell us the amount of time a computer will take to actually run an algorithm. The running time depends on implementation details like the speed of the computer, the programming language, and the translation of the language into machine code. That's why we typically describe efficiency in terms of number of operations.However, there are still times when a programmer finds it helpful to measure the run time of an implemented algorithm. Perhaps a particular operation takes a surprisingly long amount of time, or maybe there's some aspect of the programming language that's slowing an algorithm down.

To measure the run time, we must make the algorithm runnable; we must implement it in an actual programming language.

Here's a translation from the pseudocode to JavaScript:

```
var searchList = function(numbers, targetNumber) {
var index = 0;
for (var i = 0; i < numbers.length; i++) {
if (numbers[i] === targetNumber) {
return index;
}
}
return -1;
};
```

The program below calls

`searchList()`

on the list of six numbers and reports the time taken in milliseconds.If your computer is as fast as mine, then you'll see 0 milliseconds reported. That's because this algorithm takes

*way*less time than a millisecond. Our JavaScript environment can't measure time in units less than milliseconds, however.Here's what we'll try instead: we'll call

`searchList()`

100,000 times, record how many milliseconds that takes, and divide to approximate the number of nanoseconds per call. That's what this program does:On my computer, that reports around 150 nanoseconds. There are 1,000,000,000 nanoseconds in a second, so that's really quite fast. In a faster language, environment, or computer, it could be even faster.

That only tells us the average run-time for

`searchList()`

on a list of 6 numbers. We care more about how run time increases as the input size increases, however. Does it keep increasing linearly, like our above analysis predicted?The program below reports the run-time for lists of 6 numbers, 60 numbers, and 600 numbers.

Here are the results from one run on my computer:

list size | nanoseconds |
---|---|

6 | 50 |

60 | 420 |

600 | 3430 |

This graph plots those points:

As predicted, the relationship between the points is roughly linear. As the number of operations increases by a factor of 10, the nanoseconds increases at nearly that same factor of 10. The exact numbers aren't as clean as our operation counts, but that's expected, since all sorts of real world factors can affect actual running time on a computer.

### Improving efficiency

Multiple algorithms can correctly solve the same problem, but with different efficiencies. That's when measuring algorithmic efficiency really matters, to help us choose the more efficient algorithm.

The linear search algorithm can find a result in linear time, which is pretty good—but what if there was a more efficient algorithm? In fact, there is, at least for the case of a

*sorted*list.The

**binary search**algorithm can efficiently find a value in a sorted list. The algorithm starts by checking to see if the target value is higher or lower than the middle value of the list. If it's higher, it knows that it can't be in the first half of the list. If it's lower, it knows it can't be in the second half of the list. The algorithm then repeatedly compares the target value to the middle value of the remaining list, halving the list each time until it locates the value.Here's what that looks like in pseudocode:

```
PROCEDURE searchList(numbers, targetNumber) {
minIndex ← 1
maxIndex ← LENGTH(numbers)
REPEAT UNTIL (minIndex > maxIndex) {
middleIndex ← FLOOR((minIndex+maxIndex)/2)
IF (targetNumber = numbers[middleIndex]) {
RETURN middleIndex
} ELSE {
IF (targetNumber > numbers[middleIndex]) {
minIndex ← middleIndex + 1
} ELSE {
maxIndex ← middleIndex - 1
}
}
}
RETURN -1
}
```

Let's step through this algorithm. Since the linear search example used a sorted list, we can try this algorithm on the same list:

`searchList([3, 37, 45, 57, 93, 120], 45)`

The algorithm starts off by initializing

`minIndex`

to 1 and `maxIndex`

to 6.Iteration #1:

- It checks if
`minIndex`

(1) is greater than`maxIndex`

(6). Since it is not, it executes the code inside the loop. - It sets
`middleIndex`

to the average of`minIndex`

and`maxIndex`

, rounded down. That's`FLOOR((1+6)/2)`

, so`middleIndex`

stores 3. - It checks if
`targetNumber`

is equal to`numbers[middleIndex]`

. Since`numbers[3]`

and`targetNumber`

are both 45, that conditional is true, and it returns the index 3.

Wow, this binary search algorithm found the

`targetNumber`

on the very first loop. That took just 5 operations, 2 for the initialization step and 3 operations for the loop. As it turns out, this is the best case for this algorithm: the situation where the target value is in the middle of the list.What about the very worst case, when the value isn't in the list at all? Let's try finding the number 13 in that same list.

The algorithm starts off by initializing

`minIndex`

to 1 and `maxIndex`

to 6.Iteration #1:

- It checks if
`minIndex`

(1) is greater than`maxIndex`

(6). Since it is not, it executes the code inside the loop. - It sets
`middleIndex`

to`FLOOR((1+6)/2)`

, so`middleIndex`

stores 3. - It checks if
`targetNumber`

(13) is equal to`numbers[middleIndex]`

(45). They are not equal, so it continues to the next condition. - It checks if
`targetNumber`

is greater than`numbers[middleIndex]`

. It is not, so it continues to the`ELSE`

. - It sets
`maxIndex`

to`middleIndex - 1`

, so`maxIndex`

is now 2.

Iteration #2:

- It checks if
`minIndex`

(1) is greater than`maxIndex`

(2). Since it is not, it executes the code inside the loop. - It sets
`middleIndex`

to`FLOOR((1+2)/2)`

, so`middleIndex`

stores 1. - It checks if
`targetNumber`

(13) is equal to`numbers[middleIndex]`

(3). They are not equal, so it continues to the next condition. - It checks if
`targetNumber`

is greater than`numbers[middleIndex]`

. Since it*is*greater, it sets`minIndex`

to`middleIndex + 1`

, so`minIndex`

is now 2.

Iteration #3:

- It checks if
`minIndex`

(2) is greater than`maxIndex`

(2). Since it is not, it executes the code inside the loop. - It sets
`middleIndex`

to`FLOOR((2+2)/2)`

, so`middleIndex`

stores 2. - It checks if
`targetNumber`

(13) is equal to`numbers[middleIndex]`

(37). They are not equal, so it continues to the next condition. - It checks if
`targetNumber`

is greater than`numbers[middleIndex]`

. It is not, so it continues to the`ELSE`

. - It sets
`maxIndex`

to`middleIndex - 1`

, so`maxIndex`

is now 1.

Iteration #4:

- It checks if
`minIndex`

(2) is greater than`maxIndex`

(1). Since it*is*greater, it does not proceed inside the loop.

The algorithm finally returns -1, signifying the value could not be found.

That was the worst case for the binary search algorithm, and yet, it only required 3 full repetitions of the loop, with around 4 operations per loop. Remember the worst case for linear search on a list of 6 numbers? It had to loop through every number, so it required a full 6 repetitions.

That's the beauty of binary search: each loop divides the search space in half, so the algorithm has increasingly less numbers to look through.

This table shows the number of loop repetitions required for various list sizes, in the worst case:

list size | # of loop repetitions |
---|---|

6 | 3 |

60 | 6 |

600 | 10 |

With binary search, a list of 600 items only requires 10 loop repetitions! With linear search, that same list would require 600 loop repetitions.

Here's that table in graph form:

As you can see, binary search is much more efficient than linear search, especially as the list size grows. Mathematically, we can describe the relationship between list size and operations for binary search as "logarithmic", which is a much slower rate of growth than the linear relationship of linear search.

## Chcesz dołączyć do dyskusji?

Na razie brak głosów w dyskusji