Sorting Algorithms, Introduction

Sorting Algorithms: Introduction
(Up to Basic Computer Science : Lists and Arrays)

Sorting has been analyzed by computer scientists for decades, and thus it is an ideal subject to begin with when studying computer science. Sorting is done with algorithms, which are a set of specific commands that are followed in a certain order to complete a task. A cooking recipe is an algorithm, but we usually reserve the word algorithm for mathematics or science related topics.

To study sorting, you must first be comfortable with iteration and recursion. These two terms designate the ways tasks are repeatedly performed. A jolly old programmer might give you this definition of recursion:

re-cur-sion (ri-'ker-zhen) n See recursion.

If you don't know what iteration and recursion are and how they are different, look up a book on a common programming language like C++ and it should tell you. Or, you can go to Cprogramming.com, which has tutorials on C++ with a section on recursion.

Note: in most of these algorithms, two elements of an array are often switched. So, to make the algorithms easier to read, we have written a function called swap() which takes two variables and exchanges the values in them. Here is the function code in C++:

inline int swap( int &x, int &y )
{
  int z = x;
  x = y;
  y = z;
}

Note as well that we are using C++. For all the example code in this site, we will be using C++, for it is a widely used and taught language that is efficient and suitable for our purposes.

The basic sorting algorithms are mostly iterative, and thus probably easier to understand.

The most simple algorithm, but not the most intuitive algorithm, is Bubble sort. This sort is still used a lot because it is easier and shorter to type than the other sorts.

The most intuitive algorithm is Selection Sort. Humans use this sort all the time when sorting objects. This algorithm is especially useful when sorting across a network.

Radix Sort (also known as Bin Sort) is a fairly fast sort that uses digits of numbers to sort arrays. Because of this, however, Radix Sort is fairly specialized, and makes it difficult to write general purpose code.

A very important sorting algorithm is Insertion Sort. This sort may not seem important, but it is used a surprising amount because it works well with almost sorted arrays.

One of the advanced sorting algorithms is the Heap Sort, which is based on the Heap Tree. You should read the two essays on Data Trees and Heap Trees before you attempt to read about this sort.

Bubble Sort
(Up to Basic Computer Science : Lists and Arrays : Sorting Algorithms)

Bubble Sort is by far the simplest and shortest of all the sorting algorithms. Here is the code for it:

( About Example C++ code )

// By convention, 'n' is usually the size of the
// array to be sorted.
void bubblesort( int array[], int n )
{
  for ( int i = 0; i <>
    for ( int j = 1; j <>
      if ( array[j-1] > array[j] )
        // Note the use here of swap()
        swap( array[j-1], array[j] );
}
              

You can see why we call this a short algorithm. Even though it's short, I will explain it anyway.

Basically, this algorithm goes through the entire array and compares every adjacent pair of elements in the array starting from the first and working towards the last. This cycle is repeated over and over until the entire array is sorted.

Notice that at the end of each cycle, the last element, and then the next to last element, and so on is in place. Thus, the cycle is shortened by one element each time so that we don't have to bother with the last elements which are already in place.

Let us walk through this algorithm on an example: 3 2 5 4 1. Note that n = 5.

First Iteration ( i = 0 ): We are considering the array 3 2 5 4 1.

( j = 1 ): 3 > 2, so swap() and the array is now 2 3 5 4 1.
( j = 2 ):
3 <>, so nothing happens and the array is still 2 3 5 4 1.
( j = 3 ):
5 > 4, so swap() and the array is now 2 3 4 5 1.
( j = 4 ):
5 > 1, so swap() and the array is now 2 3 4 1 5.

j is no longer strictly less than n - i, so we are finished with the First Iteration.

Second Iteration ( i = 1 ): We are now considering the array 2 3 4 1. (Not the last one, remember? n - i = 4, so we only consider the first 4 elements.)

( j = 1 ): 2 <>, so nothing happens and the array is still 2 3 4 1.
( j = 2 ):
3 <>, so nothing happens and the array is still 2 3 4 1.
( j = 3 ):
4 > 1, so swap() and the array is now 2 3 1 4.

j is no longer strictly less than n - i, so we are finished with the Second Iteration.

Third Iteration ( i = 2 ): We are now considering the array 2 3 1.

( j = 1 ): 2 <>, so nothing happens and the array is still 2 3 1.
( j = 2 ):
3 > 1, so swap() and the array is now 2 1 3.

j is no longer strictly less than n - i, so we are finished with the Third Iteration.

Fourth Iteration ( i = 3 ): We are now considering the array 2 1.

( j = 1 ): 2 > 1, so swap() and the array is now 1 2.

j is no longer strictly less than n - i, so we are finished with the Fourth Iteration.

Now i is no longer strictly less than n-1, so we are finished sorting.

Now that you have seen the entire sorting process written out, the algorithm will make more sense to you.

One of the problems with Bubble Sort is that even when the array is fully sorted, the sort continues. This problem, however, is easily remedied: just insert a boolean flag that keeps track of whether or not the sort called swap() on anything in the last iteration. If it did not, then the sort has finished early.

Still, improving the efficiency of the Bubble Sort algorithm is really quite a useless effort. If you want more efficiency, use another algorithm. The reason for using Bubble Sort is because it is easy to type, so making Bubble Sort more complex is not really all that appealing.

If you would like a usable version of the Bubble Sort algorithm, there is one below that uses templates. The only efficiency improvement to Bubble Sort is the early exit flag, since that does not add much length to the code.

Selection Sort
(Up to Basic Computer Science : Lists and Arrays : Sorting Algorithms)

Selection Sort is the sort used most often by humans, since it's makes a lot of sense. The algorithm is simple:

Let's say you are sorting some index cards labeled with numbers, and you want to order them smallest to largest. Here are the index cards:

4, 3, 1, 5, 2

Scan the list and find the smallest. This is, of course, 1. Put that into the first position, switching it wih 4. We now have this:

1, 3, 4, 5, 2

Now, we can ignore the 1 and concentrate on the other 4 numbers. Now, we find the next smallest number (which is 2), and move it to the second position. Then we do this for the third smallest, and so on. We keep following this process until we have what we wanted: a sorted list.

On the computer, this algorithm can be consolidated into this form:

For every n, ranging from 1 to the size of the array, start at the nth element of the array and find the smallest element. Switch this element with the nth element of the array.

The concept is quite simple, and therefore makes the algorithm quite easy to program. Even though Selection Sort is one of the slower algorithms, it is used because it sorts with a minimum of data shifting. That is, the algorithm doesn't swap the elements of the array as many times as other algorithms do in order to sort the array. This is really useful when you're programming over a network, because sometimes transferring data can be really slow and time consuming.

Although you should be able to easily program your own version of this simple algorithm, you can study and use the commented version we have in our files:


Radix Sort, Part 1
(Up to Basic Computer Science : Lists and Arrays : Sorting Algorithms)

[Part 1] [Part 2]

Radix Sort is a clever and intuitive little sorting algorithm. Radix Sort puts the elements in order by comparing the digits of the numbers. I will explain with an example.

Consider the following 9 numbers:

493 812 715 710 195 437 582 340 385

We should start sorting by comparing and ordering the one's digits:

Digit

Sublist

0

340 710

1

2

812 582

3

493

4

5

715 195 385

6

7

437

8

9

Notice that the numbers were added onto the list in the order that they were found, which is why the numbers appear to be unsorted in each of the sublists above. Now, we gather the sublists (in order from the 0 sublist to the 9 sublist) into the main list again:

340 710 812 582 493 715 195 385 437

Note: The order in which we divide and reassemble the list is extremely important, as this is one of the foundations of this algorithm.

Now, the sublists are created again, this time based on the ten's digit:

Digit

Sublist

0

1

710 812 715

2

3

437

4

340

5

6

7

8

582 385

9

493 195

Now the sublists are gathered in order from 0 to 9:

710 812 715 437 340 582 385 493 195

Finally, the sublists are created according to the hundred's digit:

Digit

Sublist

0

1

195

2

3

340 385

4

437 493

5

582

6

7

710 715

8

812

9

At last, the list is gathered up again:

195 340 385 437 493 582 710 715 812

And now we have a fully sorted array! Radix Sort is very simple, and a computer can do it fast. When it is programmed properly, Radix Sort is in fact one of the fastest sorting algorithms for numbers or strings of letters.

Insertion Sort
(Up to Basic Computer Science : Lists and Arrays : Sorting Algorithms)

Insertion Sort, like Selection Sort, is a fairly intuitive algorithm. Insertion Sort's advantages lie in that it works really well with almost sorted arrays. In the non-computer world, it is often used by card players sorting cards that are dealt one at a time.

Imagine that you have the following 5 cards dealt to you in this order (we only care about the numbers, so we'll ignore the suits): 3, 2, 4, 5, 1.

First, we're dealt the card with the 3 on it. Then, we're dealt the 2, which is inserted before the 3. Then, we're dealt the 4 and the 5, both of which are put at the end. When we are given the 1, it is inserted before the 2. Now, we've been dealt five cards, and so the hand been sorted.

Of course, when we are sorting an array, we don't get the elements one at a time. Instead, we get them all at once, but we can still visualize them as if we are being given them one at a time. We just ignore the elements that haven't been "dealt" to us yet.

Here's another example: let's sort this list: 3, 5, 7, 2, 8, 1, 9, 0, 4, 6.

The sort goes something like this:

Round

Numbers "Dealt"

Numbers Not "Dealt"

1

3

5 7 2 8 1 9 0 4 6

2

3 5

7 2 8 1 9 0 4 6

3

3 5 7

2 8 1 9 0 4 6

4

2 3 5 7

8 1 9 0 4 6

5

2 3 5 7 8

1 9 0 4 6

6

1 2 3 5 7 8

9 0 4 6

7

1 2 3 5 7 8 9

0 4 6

8

0 1 2 3 5 7 8 9

4 6

9

0 1 2 3 4 5 7 8 9

6

10

0 1 2 3 4 5 6 7 8 9

Notice that the numbers that have been "dealt" are always in order..

Easy and simple, right? Well, the way the computer does Insertion Sort isn't quite so simple.

The computer begins with the new card at the end and swaps it backwards until it is in the correct location. So, the card with the 1 would have to be swapped 4 times. This method becomes extremely long in certain cases, such as when the list begins completely backwards (5, 4, 3, 2, 1). That particular formation is the worst case scenario, and for that, Insertion Sort is very, very slow.

So, a man by the name of Shell created a sorting algorithm called the Diminishing-Increment Insertion Sort, also known simply as Shell Sort. Mr. Shell tried to get the benefits of Insertion Sort without the drawbacks. Unfortunately, Mr. Shell's Sort is far from being the fastest algorithm, and it is more complex than Insertion or Selection Sort.

Still, for most cases, Insertion Sort is much faster than Selection Sort, and it is not very complicated to type, so it is used often in various computer science projects.

There is a nicely commented, usable version of Insertion Sort that you can download at our site:

Heap Trees, Part 2
(Up to Basic Computer Science : Data Trees)

[Part 1] [Part 2]

Heap sort is one of the fastest sorting algorithms, rivaling such speed-demons as Quicksort and mergesort. The advantages of heap sort are that it does not use recursion and that heap sort works just as fast for any data order. That is, there is basically no worst case scenario.

First, since the heap we just implemented is an array, you can just convert the array into a heap with the following algorithm: (refer back to the code structure on the previous page)

  • Call shiftDown() on the parent of the last element. (Halfway back, remember?)
  • Moving left, keep calling shiftDown() of the parent of the next to last element and so on until the end of the level is reached.
  • Move up a level, and keep calling shiftDown() on the parent node in the same way.
  • Keep repeating the second and third step until you've called shiftDown() on the root node.

Basically, since the heap is an array at heart anyway, you just call shiftDown() on the parent of the last child or pair of children, and then keep working your way left across the array, calling shiftDown() on each element.

Now that you have a legitamate heap, you can sort it back into an ordered array. Heap sort works from back to front; it starts with the last element of the sorted array and then steadily works towards the front. Just keep calling remove() and storing the data from back to front until the heap is empty.

Even though this sounds a bit repetative, it really is quite fast, especially for large numbers of elements. Sometimes, it can be faster than Quicksort or mergesort just because the heap sort is not recursive. Although on average heap sort is slower than Quicksort and mergesort, heap sort is more consistent in its speed. Thus, in applications where timing may be crucial, heap sort is preferred to Quicksort.

The heap tree and heap sort should be much easier to implement than the binary search tree. Just follow the guidelines established above, and the programming shouldn't trouble you much. The description page for the source code file will explain how the heap sort is implemented in the code.

0 comments:

Bidvertiser

Designed by Posicionamiento Web | Bloggerized by GosuBlogger | Blue Business Blogger