Learn Basic Algorithms and Data Structures

There are a variety of important algorithms and data structures that are the lingua franca of computer science. Everyone needs to know what a <>or binary tree is because they get used all the time. Perhaps just as important are fundamental algorithms like binary search, graph searching algorithms, sorting algorithms, and tree-based searches such as minimax. These algorithms and their variants show up a lot, and people will generally expect that any computer scientist will understand how they work.

Learn Basic Theory

There are a couple of things you should definitely be aware of: you should understand how to represent numbers in different bases and how to manipulate boolean expressions using boolean logic. Both of these tools will come in handy in a variety of circumstances, especially when reading other people's code or trying to clarify your own. (Boolean logic is especially useful for formulating clear conditional statements.)

Even more important, you should have a good sense of the limits of current computers. In particular, it really helps to understand the ideas in algorithmic efficiency and Big-O notation. Knowing these topics makes it clearer why certain programs and algorithms will take a very long time to run, and how to spot them. It also makes it signifcantly easier to optimize your program when you know which algorithms to choose.

Finally, it's useful to have a sense of the absolute limits of what computers can do; it turns out that there are just some things that are not possible for a computer to do. Sometimes it can be tempting to try to write a program that does just that!

Understanding Different Base Systems

This essay is targeted at new students of computer programming or computer science who want to understand how base two (binary), base eight (octal), and base sixteen (hexadecimal) work.

First of all, it's important to realize that each of these base systems is just another way of writing down the same number. When you convert a number between different bases, it should still have the same value. In this essay, when I want to refer to the actual value of a number (regardless of its base), I'll do it in base 10 because that's what most people are used to.

It's generally easiest to understand the concept of different bases by looking at base 10. When we have a number in base 10, each digit can be referred to as the ones digit, tens digit, the hundreds digit, the thousands digit, or so forth. For instance, in the number 432, 4 is the hundreds digit, 3 is the tens digit, and 2 is the ones digit.

Another way to think about this is to rewrite 432 as

  4 x 102 
+ 3 x 101
+ 2 x 100

Each digit is multiplied by the next power of ten. Numbers in other bases, such as base 16, are merely numbers where the base is not ten! For instance, we could interpret 432 as though it were in base 16 by evaluating it as

  4 x 162 
+ 3 x 161
+ 2 x 100

This would be the same as the number 1074 in base 10.

So to convert a number from a given base into base 10, all we need to do is treat each place as a power of the given base times the value of the digit in that place. Note that customarily for a given base, only digits from 0 to the base minus one are used. For instance, in decimal, we only use the digits 0 through 9. That's because we don't need any more digits to express every possible number. (But we do need at least that many; if we only had 8 digits, how would we ever express the value 9?)

Now, bases greater than 10 will require more than 10 possible digits. For intsance, the number 11 in base ten can be expressed in base 16 with only a single digit because the ones place in base 16 can range from 0 to 15. Since we only have 10 digits, the letters A through F are used to stand for the "digits" 10 through 15. So, for instance, the hexadecimal number B stands for the decimal number 11.

Bases less than ten will require fewer digits--for instance, binary, which works using powers of two, only needs two digits: one and zero. The binary number 1001, for instance, is the same as writing

 
1 * 23
1 * 22
1 * 21
1 * 20

which comes out to the decimal value 9.

Numbers written in octal use a base of 8 instead of 2 or 16. See if you can figure out what the number 20 written in octal would be in base ten.

Because octal, hexadecimal, and decimal numbers can often share the same digits, there needs to be some way of distinguishing between them. Traditionally, octal numbers are written with a leading 0; for instance, 020 is the same thing as the number 20 in base 8. Hexadecimal numbers are written with the prefix of "0x". So 0x20 would be the number 20 in base 16; we'd interpret it the same as the decimal number 32.

Converting from decimal to octal or hexadecimal

It turns out that when you wish to convert from decimal to octal or hexadecimal, there is a very easy formula that you can use. I'll give you the one for octal, and let you puzzle out the hexadecimal version (which may come quite naturally to some of you).

To convert from octal to hexadecimal, all you need to do is group the binary digits into pairs of three and convert each one into the corresponding octal number. For instance, given the binary number 010011110, you would group 011 and 110 together. 010 is 2, 011 is 3 and 110 is 6. So the octal number is 0236.

So why exactly does this work? Well, let's take a look at what 011110 looks like:

 
  0 * 28
  1 * 27
  0 * 26
  0 * 25
+ 1 * 24
+ 1 * 23
+ 1 * 22
+ 1 * 21
+ 0 * 20

That's actually the same as

 
  0 * 22 * 26
+ 1 * 21 * 26
+ 0 * 20 * 26
+ 0 * 22 * 23
+ 1 * 21 * 23
+ 1 * 20 * 23
+ 1 * 22 * 20
+ 1 * 21 * 20
+ 0 * 20 * 20

Whoa! First, notice that the far right column is actually turning into powers of 8! 23 is 8, and 26 is 64! So this means for each group of three digits, we have the base increasing by a factor of 8. Moreover, look at the right hand column. It can sum up to at most 7 (since 20 + 21 + 22 = 1 + 2 + 4 and the binary digit just decides whether each power of two is included into the sum or not). That's exactly the same as having eight digits, 0 through 7, and once we sum them all together, we multiply the sum by a power of eight. That's just the same as making each group of three binary digits an octal digit!

Knowing this, can you come up with the way to do the same thing for hexadecimal numbers?

Efficiency and the Space-Time Continuum
(Up to Basic Computer Science)
Get Printable Version

A lot of computer science is about efficiency. For instance, one frequently used mechanism for measuring the theoretical speed of algorithms is Big-O notation. What most people don't realize, however, is that often there is a trade-off between speed and memory: or, as I like to call it, space and time.

Think of space efficiency and time efficiency as two opposite ends on a band (a continuum). Every point in between the two ends has a certain time and space efficiency. The more time efficiency you have, the less space efficiency you have, and vice versa. The picture below illustrates this in a simple fashion:

The Space-Time Continuum

Algorithms like Quicksort and Mergesort are exceedingly fast, but require lots of space to do the operations. On the other side of the spectrum, Bubble Sort is exceedingly slow, but takes up the minimum of space.

Heap Sort, for instance, has a very good balance between space and speed. The heap itself takes up about the same space as an array, and yet the speed of the sort is in the same order of magnitude as Quicksort and Mergesort (although it is slower on average than the other two). Heap Sort has the additional benefit of being quite consistent in its speed, so it is useful in programs where timing is crucial (i.e. networks).

For data trees, 2-3 trees and 2-3-4 trees are faster and more balanced than the normal binary search trees, but they take up an extraordinary amount of space because they usually have tons of unused variables lying around.

The Red-Black tree is a compromise between space and time. The Red-Black tree is basically a binary tree representation of a 2-3-4 tree, and so it takes up less space than the 2-3-4 tree (it doesn't have all of those empty fields), but it still retains the search efficiency of the 2-3-4 tree!

Thus, there has to be a balance in the space and time aspects of computing. Most of the research in Computer Science these days is devoted to Time efficiency, particularly the theoretical time barrier of NP-Complete problems (like the Traveling Salesman Problem). These days memory is cheap, and storage space almost seems to be given away.

With networking and robotics, however, the necessity of a balance becomes apparent. Often, memory on these machines is scarce, as is processing power. Memory has to be used conservatively, otherwise the network servers could become stalled until the operation is finished. Robots often have to function with the limited resources installed on their own structures, and thus many times they do not have the memory to be spared for vast computing speed. In these situations, a compromise must be made.

With networking, this issue becomes mainly about topology. Network Topology is basically a description of how the physical connections of a network are set up. Maybe you know the term "daisy chain": that is a kind of network topology. A daisy chain (in which all computers are connected to two others in one chain) uses the minimum of cables, but the relative speed of the connections is smaller, because if the computer on one end tries to send a signal to a computer at the other end, it must first go through every computer in between. On the other hand, if every computer were connected to every other computer (called a "fully connected mesh network"), signals would be fast, but you would use a lot more cables, and the network may become hard to maintain. So, in this case, space correlates to the number of connections in a network, while time refers to the speed that signals travel the network.

Thus, although it may seem a trivial issue, it is really quite important, even now, to have efficiency in both space and time. Of course, the type of compromise made depends on the situation, but generally, for most programmers, time is of the essence, while for locations in which memory is scarce, of course, space is the issue. Maybe someday we'll be able to find algorithms that are extremely efficient in both speed and memory, bridges in the Space-Time continuum.

0 comments:

Bidvertiser

Designed by Posicionamiento Web | Bloggerized by GosuBlogger | Blue Business Blogger