Główna zawartość

Kurs: Podstawy informatyki - program rozszerzony > Rozdział 4

Lekcja 2: Ocena złożoności algorytmów

Pomiar wydajności algorytmu

A good algorithm is correct, but a great algorithm is both correct and efficient. The most efficient algorithm is one that takes the least amount of execution time and memory usage possible while still yielding a correct answer.

Counting the operations

One way to measure the efficiency of an algorithm is to count how many operations it needs in order to find the answer across different input sizes.

Let's start by measuring the linear search algorithm, which finds a value in a list. The algorithm looks through each item in the list, checking each one to see if it equals the target value. If it finds the value, it immediately returns the index. If it never finds the value after checking every list item, it returns -1.

Here's what that algorithm looks like in pseudocode:

PROCEDURE searchList(numbers, targetNumber) {
  index ← 1
  REPEAT UNTIL (index > LENGTH(numbers)) {
    IF (numbers[index] = targetNumber) {
      RETURN index
    }
    index ← index + 1
  }
  RETURN -1
}

Note that this pseudocode assumes a 1-based index for lists.

Let's step through the algorithm for this sample input:

searchList([3, 37, 45, 57, 93, 120], 45)

The algorithm starts off by initializing the variable index to 1. It then begins to loop.

Iteration #1:

It checks if index is greater than LENGTH(numbers). Since 1 is not greater than 6, it executes the code inside the loop.
It compares numbers[index] to targetNumber. Since 3 is not equal to 45, it does not execute the code inside the conditional.
It increments index by 1, so it now stores 2.

Iteration #2:

It checks if index is greater than LENGTH(numbers). Since 2 is not greater than 6, it executes the code inside the loop.
It compares numbers[index] to targetNumber. Since 37 is not equal to 45, it does not execute the code inside the conditional.
It increments index by 1, so it now stores 3.

Iteration #3:

It checks if index is greater than LENGTH(numbers). Since 3 is not greater than 6, it executes the code inside the loop.
It compares numbers[index] to targetNumber. Since 45 is equal to 45, it executes the code inside the conditional.
It returns the current index, 3.

Now let's count the number of operations that the linear search algorithm needed to find that value, by counting how often each type of operation was called.

operation	# of times
Initialize index to 1	1
Check if `index` is greater than `LENGTH(numbers)`	3
Check if `numbers[index]` equals `targetNumber`	3
Return `index`	1
Increment `index` by 1	2

That's a total of 10 operations to find the targetNumber at the index of 3. Notice the connection to the number 3? The loop repeated 3 times, and each time, it executed 3 operations.

Sprawdź swoją wiedzę

How many operations are required to find the number 3, located at the 1st position in the list?

How many operations are required to find the number 37, located at the 2nd position in the list?

Let's make a table of the number of operations required based on the position of targetNumber in the list.

See if you can fill in the blanks:

location in list	# of operations
1st item in list	Prawidłowa odpowiedź to: liczba całkowita, taka jak $6$ ‍ właściwy uproszczony ułamek, taki jak $3 / 5$ ‍ niewłaściwy uproszczony ułamek, taki jak $7 / 4$ ‍ liczba mieszana, taka jak $1 3 / 4$ ‍ dokładny ułamek dziesiętny, taki jak $0, 75$ ‍ wielokrotność pi, taka jak $12 pi$ ‍ lub $2 / 3 pi$ ‍
2nd item in list	Prawidłowa odpowiedź to: liczba całkowita, taka jak $6$ ‍ właściwy uproszczony ułamek, taki jak $3 / 5$ ‍ niewłaściwy uproszczony ułamek, taki jak $7 / 4$ ‍ liczba mieszana, taka jak $1 3 / 4$ ‍ dokładny ułamek dziesiętny, taki jak $0, 75$ ‍ wielokrotność pi, taka jak $12 pi$ ‍ lub $2 / 3 pi$ ‍
3rd item in list	10
4th item in list	Prawidłowa odpowiedź to: liczba całkowita, taka jak $6$ ‍ właściwy uproszczony ułamek, taki jak $3 / 5$ ‍ niewłaściwy uproszczony ułamek, taki jak $7 / 4$ ‍ liczba mieszana, taka jak $1 3 / 4$ ‍ dokładny ułamek dziesiętny, taki jak $0, 75$ ‍ wielokrotność pi, taka jak $12 pi$ ‍ lub $2 / 3 pi$ ‍
5th item in list	Prawidłowa odpowiedź to: liczba całkowita, taka jak $6$ ‍ właściwy uproszczony ułamek, taki jak $3 / 5$ ‍ niewłaściwy uproszczony ułamek, taki jak $7 / 4$ ‍ liczba mieszana, taka jak $1 3 / 4$ ‍ dokładny ułamek dziesiętny, taki jak $0, 75$ ‍ wielokrotność pi, taka jak $12 pi$ ‍ lub $2 / 3 pi$ ‍
6th item in list	Prawidłowa odpowiedź to: liczba całkowita, taka jak $6$ ‍ właściwy uproszczony ułamek, taki jak $3 / 5$ ‍ niewłaściwy uproszczony ułamek, taki jak $7 / 4$ ‍ liczba mieszana, taka jak $1 3 / 4$ ‍ dokładny ułamek dziesiętny, taki jak $0, 75$ ‍ wielokrotność pi, taka jak $12 pi$ ‍ lub $2 / 3 pi$ ‍

The best case for an algorithm is the situation which requires the least number of operations. According to that table, the best case for linear search is when targetNumber is the very first item in the list.

What about the worst case? According to that table, the worst case is when targetNumber is the very last item in the list, since that requires repeating the 3 loop operations for every item in the list.

There is actually a slightly worse worst case: the situation where targetNumber is not in the list at all. That'll require a couple extra operations, to check the loop condition and return -1. It doesn't require an entire extra loop repetition, however, so it's not that much worse. Depending on our use case for linear search, the worst case may come up very often or almost never at all.

The average case is when targetNumber is in the middle of the list. That's the example we started with, and that required 3 repetitions of the loop for the list of 6 items.

Let's describe the efficiency in more general terms, for a list of

n

items. The average case requires

1

operation for variable initialization, then

n / 2

loop repetitions, with

3

operations per loop. That's

1 + 3 (n / 2)

This table shows the number of operations for increasing list sizes:

list size	# of operations
6	10
60	91
600	901

Let's see that as a graph:

Generally, we can say that there's a "linear" relationship between the number of items in the list and the number of operations required by the algorithm. As the number of items increase, the number of operations increases in proportion.

Empirical measurements

The number of operations does not tell us the amount of time a computer will take to actually run an algorithm. The running time depends on implementation details like the speed of the computer, the programming language, and the translation of the language into machine code. That's why we typically describe efficiency in terms of number of operations.

However, there are still times when a programmer finds it helpful to measure the run time of an implemented algorithm. Perhaps a particular operation takes a surprisingly long amount of time, or maybe there's some aspect of the programming language that's slowing an algorithm down.

To measure the run time, we must make the algorithm runnable; we must implement it in an actual programming language.

Here's a translation from the pseudocode to JavaScript:

var searchList = function(numbers, targetNumber) {
    var index = 0;
    for (var i = 0; i < numbers.length; i++) {
        if (numbers[i] === targetNumber) {
            return index;
        }
    }
    return -1;
};

The program below calls searchList() on the list of six numbers and reports the time taken in milliseconds.

If your computer is as fast as mine, then you'll see 0 milliseconds reported. That's because this algorithm takes way less time than a millisecond. Our JavaScript environment can't measure time in units less than milliseconds, however.

Here's what we'll try instead: we'll call searchList() 100,000 times, record how many milliseconds that takes, and divide to approximate the number of nanoseconds per call. That's what this program does:

On my computer, that reports around 150 nanoseconds. There are 1,000,000,000 nanoseconds in a second, so that's really quite fast. In a faster language, environment, or computer, it could be even faster.

That only tells us the average run-time for searchList() on a list of 6 numbers. We care more about how run time increases as the input size increases, however. Does it keep increasing linearly, like our above analysis predicted?

The program below reports the run-time for lists of 6 numbers, 60 numbers, and 600 numbers.

Here are the results from one run on my computer:

list size	nanoseconds
6	50
60	420
600	3430

This graph plots those points:

As predicted, the relationship between the points is roughly linear. As the number of operations increases by a factor of 10, the nanoseconds increases at nearly that same factor of 10. The exact numbers aren't as clean as our operation counts, but that's expected, since all sorts of real world factors can affect actual running time on a computer.

Improving efficiency

Multiple algorithms can correctly solve the same problem, but with different efficiencies. That's when measuring algorithmic efficiency really matters, to help us choose the more efficient algorithm.

The linear search algorithm can find a result in linear time, which is pretty good—but what if there was a more efficient algorithm? In fact, there is, at least for the case of a sorted list.

The binary search algorithm can efficiently find a value in a sorted list. The algorithm starts by checking to see if the target value is higher or lower than the middle value of the list. If it's higher, it knows that it can't be in the first half of the list. If it's lower, it knows it can't be in the second half of the list. The algorithm then repeatedly compares the target value to the middle value of the remaining list, halving the list each time until it locates the value.

Here's what that looks like in pseudocode:

PROCEDURE searchList(numbers, targetNumber) {
  minIndex ← 1
  maxIndex ← LENGTH(numbers)
  REPEAT UNTIL (minIndex > maxIndex) {
    middleIndex ← FLOOR((minIndex+maxIndex)/2)
    IF (targetNumber = numbers[middleIndex]) {
      RETURN middleIndex
    } ELSE {
       IF (targetNumber > numbers[middleIndex]) {
         minIndex ← middleIndex + 1
       } ELSE {
         maxIndex ← middleIndex - 1
       }
     }
  }
  RETURN -1
}

Let's step through this algorithm. Since the linear search example used a sorted list, we can try this algorithm on the same list:

searchList([3, 37, 45, 57, 93, 120], 45)

The algorithm starts off by initializing minIndex to 1 and maxIndex to 6.

Iteration #1:

It checks if minIndex (1) is greater than maxIndex (6). Since it is not, it executes the code inside the loop.
It sets middleIndex to the average of minIndex and maxIndex, rounded down. That's FLOOR((1+6)/2), so middleIndex stores 3.
It checks if targetNumber is equal to numbers[middleIndex]. Since numbers[3] and targetNumber are both 45, that conditional is true, and it returns the index 3.

Wow, this binary search algorithm found the targetNumber on the very first loop. That took just 5 operations, 2 for the initialization step and 3 operations for the loop. As it turns out, this is the best case for this algorithm: the situation where the target value is in the middle of the list.

What about the very worst case, when the value isn't in the list at all? Let's try finding the number 13 in that same list.

The algorithm starts off by initializing minIndex to 1 and maxIndex to 6.

Iteration #1:

It checks if minIndex (1) is greater than maxIndex (6). Since it is not, it executes the code inside the loop.
It sets middleIndex to FLOOR((1+6)/2), so middleIndex stores 3.
It checks if targetNumber (13) is equal to numbers[middleIndex] (45). They are not equal, so it continues to the next condition.
It checks if targetNumber is greater than numbers[middleIndex]. It is not, so it continues to the ELSE.
It sets maxIndex to middleIndex - 1, so maxIndex is now 2.

Iteration #2:

It checks if minIndex (1) is greater than maxIndex (2). Since it is not, it executes the code inside the loop.
It sets middleIndex to FLOOR((1+2)/2), so middleIndex stores 1.
It checks if targetNumber (13) is equal to numbers[middleIndex] (3). They are not equal, so it continues to the next condition.
It checks if targetNumber is greater than numbers[middleIndex]. Since it is greater, it sets minIndex to middleIndex + 1, so minIndex is now 2.

Iteration #3:

It checks if minIndex (2) is greater than maxIndex (2). Since it is not, it executes the code inside the loop.
It sets middleIndex to FLOOR((2+2)/2), so middleIndex stores 2.
It checks if targetNumber (13) is equal to numbers[middleIndex] (37). They are not equal, so it continues to the next condition.
It checks if targetNumber is greater than numbers[middleIndex]. It is not, so it continues to the ELSE.
It sets maxIndex to middleIndex - 1, so maxIndex is now 1.

Iteration #4:

It checks if minIndex (2) is greater than maxIndex (1). Since it is greater, it does not proceed inside the loop.

The algorithm finally returns -1, signifying the value could not be found.

That was the worst case for the binary search algorithm, and yet, it only required 3 full repetitions of the loop, with around 4 operations per loop. Remember the worst case for linear search on a list of 6 numbers? It had to loop through every number, so it required a full 6 repetitions.

That's the beauty of binary search: each loop divides the search space in half, so the algorithm has increasingly less numbers to look through.

This table shows the number of loop repetitions required for various list sizes, in the worst case:

list size	# of loop repetitions
6	3
60	6
600	10

With binary search, a list of 600 items only requires 10 loop repetitions! With linear search, that same list would require 600 loop repetitions.

Here's that table in graph form:

As you can see, binary search is much more efficient than linear search, especially as the list size grows. Mathematically, we can describe the relationship between list size and operations for binary search as "logarithmic", which is a much slower rate of growth than the linear relationship of linear search.

Sprawdź swoją wiedzę

Your program has a list of 10,000 numbers, sorted from lowest to highest, and you need to see whether a number is in that list.

Which algorithm do you use?

Chcesz dołączyć do dyskusji?

Zaloguj się

Sortuj według

Na razie brak głosów w dyskusji

Rozumiesz angielski? Kliknij tutaj, aby zobaczyć więcej dyskusji na angielskiej wersji strony Khan Academy.