Quantcast
Channel: db-in's blog
Viewing all articles
Browse latest Browse all 12

Binary World

$
0
0

Hello everyone!

In this article, I’ll talk about the binaries. I’ll cover the most important concepts and operations, let’s see some examples of where and how to use binaries and also how you can use it in your daily work. If you are looking for a tutorial about binaries or a “binary tutorial”, you are in the right place.

Let’s delve into the binary world.


Here is a little list of contents to orient your reading:

List of Contents to this Tutorial


At a Glance

The binaries are the most basic and fastest approach in any programming language. Why? Our computers, and I mean, any kind of processing machine, is composed by a board with electric circuits on it, a lot of them. Through these circuits, electrical impulses passes every time, by one single circuit can pass thousands impulses per second.

Representation of electrical impulses on a computer board.

Representation of electrical impulses on a computer board.

The receptors of the impulses are prepared to receive only one kind of information: “has impulse or not”, it’s a single “YES or NO”. This kind of information is exactly what a bit is. A bit can has the values 1 or 0 (“YES or NO”). Very simple, right?

This is why the binaries are the fastest approach ever. Because it’s the most singular representation of an electrical impulse, which is the basic communication of the components of our hardwares. Everything what comes next in this article is just a little of organization above this kind of information. We’ll see how organize the bits to form a byte, data types, files and everything else in our virtual world!

After we understand the binary world, we’ll apply some mathematics on it and make simple operations using the BITWISE operators. Finally we go up to high level programming language and see how the binaries can be used in our daily work to improve our applications.


Bit and Byte

top
Some people get confused with these names, so let’s make everything absolutely clear: “A Bit IS NOT a Byte!”. To our high level programming language, a single bit is “invisible”, we can’t set a bit individually, all our work is done on the byte level. To us, the byte is the most basic unit, not a bit. But what the hell is the diference between bit and byte?

Well, one byte is composed by 8 bits, this is a constant, it’s a rule, this never will change. 1 Byte = 8 Bits.

So, if we think in a sequence of 8 bits, remembering from combinatorial analysis, we can say that a set of 8 bits (with 2 possible values: 0 or 1) results in 256 possibilities (2^8 = 256). To make full use of these 256 possibilites, we can think in that sequence as Power of Two.

Every sequence of 8 bits forms 1 byte, which has 256 combinations.

Every sequence of 8 bits forms 1 byte, which has 256 combinations.

Confused? Well, look at the next list, it could help you to understand this concept:

Binary Computation
  • 1st Bit (Right to Left): 2 ^ 0 = 1
  • 2nd Bit (Right to Left): 2 ^ 1 = 2
  • 3th Bit (Right to Left): 2 ^ 2 = 4
  • 4th Bit (Right to Left): 2 ^ 3 = 8
  • 5th Bit (Right to Left): 2 ^ 4 = 16
  • 6th Bit (Right to Left): 2 ^ 5 = 32
  • 7th Bit (Right to Left): 2 ^ 6 = 64
  • 8th Bit (Right to Left): 2 ^ 7 = 128

To compute the value, you look at every bit and check if it is valid or not (0 or 1), if it’s valid you sum the value it represents. Then you have the final value to a specific byte. The following image shows an example:

Each bit represents a single "YES" or "NO".

Each bit represents a single "YES" or "NO".

The calculation in the image above was: 128 + 64 + 32 + 4 + 1 = 229. Very simple, right?


Try it Yourself

top
You could try to discover the following values by yourself:

Discover the Values



If you miss some value, don’t worry, because in reality we’ll never need to make this kind of calculation by our selves. As I told you before, what happens with bits inside a byte is invisible to our high level programming languages. We’ll never (or almost never) need to write “myVariable = 10101100;“, instead we’ll use: “value = 172;

Now let’s go up one level and understand how a byte can be combined with another one to form more complex structures.


Combining Bytes

top
As we’ve saw a Byte can has 256 different values, 0 – 255. But we are used to use data types with a large range, like int, wich can be -2,147,483,648 to 2,147,483,647 or even the floating numbers, how it can be?

It’s simple, by placing more than 1 byte to describe a data. If you are 20 years or old, you probably remember from devices of 8 bits, right? Like the game console Nintendo 8 bits. Well, now you know, that console was capable to process data types only with 1 byte at time. After comes platforms of 16 bits, 32 bits and today we are used to work with 64 bits platforms. What that means?

In case of 16 bits, that means 1 byte can be processed together with another byte (2 x 8 bits), for a 32 bits platforms, that means 4 bytes can be processed together (4 x 8 bits), for a 64 bits platforms that means 8 bytes processed together (8 x 8 bits) and so on. The following picture illustrate 32 bits data types int and float:

Modern processors can work with multiple bytes at same time.  This is a representation of 32 bits data types.

Modern processors can work with multiple bytes at same time. This is a representation of 32 bits data types.

Thinking about it: a simple change of processing, from reading only 1 byte to more than one, can changes everything in our devices. For example, with 8 bits at once we have a range of 256 values for colors, do you remember from computers with that configuration? Yes, it was sucks! Now, with 4 bytes at the same time we can have systems running with 4,294,967,296 different colors. Imagine the performance and processing gain to our hardwares with that simple change: from reading 1 byte to reading many of them.

Just as a divagation, in these days, some people say about new kind of processors: Quantum Processors! The idea is change from electrical impulses on a board to electrons state in an atom. Basically an electron have 3 states in the energy level: Up (positive), Middle (neutral) or Down (negative), so instead only a single “YES or NO”, we could identify three states: “UP, MIDDLE and DOWN”. This single fact can drive us to a new perspective and a new kind of “bit”. Besides, unlike the impulses on a board, the electrons in an atom space don’t need to follow a physical path, its is just atomic vibration, imagine how fast it can be. Now imagine how many electrons can exist in a single atom, how many “bytes” (or something similar) we could process at the same time! Today some super computers are running on great platforms like 128 bits, 256 bits, 512 bits and more, but nothing could be compared to a Quantum Processor! Well, to say the truth, I’ve read that a team led by Yale University researchers has created the first quantum processor. This still far from the market, but not so far from our world.

Well, this is enough divagation. Back to the earth!

Great, now we know how our high level programming languages deal with binaries and more, we know how the binaries work. So every time you create an int data type, for example, in reality, you are creating a set of 32 bits, or in simple words, 32 slots of 0 or 1: 00000000 00000000 00000000 00000000.

It’s time to learn some operation we can make in the bit level.


Bitwise Operators

top
There are just few bitwise operators, actually 4. The actions they make are completely different than traditional mathematical operations.

Bitwise Operators
AND: &

Represented by & (ampersand), it takes only the filled bits between two bit sequences. For example:

   01001101
&  00101011
   --------
   00001001

The example above with pseudo-code notation: 77 & 43 = 9.

OR: |



Represented by | (pipe), it takes all the filled bits between two bit sequences. For example:

   01001101
|  00101011
   --------
   01101111

The example above with pseudo-code notation: 77 | 43 = 111.

XOR: ^



Represented by ^ (caret), it works exactly as the inverse of AND, it takes only filled bits which doesn’t repeat between two bit sequences. For example:

   01001101
^  00101011
   --------
   01100110

The example above with pseudo-code notation: 77 ^ 43 = 102.

NOT: ~



Represented by ~ (tilde), different from other ones, this operator works on only one bit sequence, it’s also called “complement”. It performs a negation on each bit field. So if a bit is 0, it becomes 1 and vice versa. For example:

~  01001101
   --------
   10110010

The example above with pseudo-code notation: ~77 = 178.
It is also called "complement" because for unsigned values, it will work as the formula: ~x = {max} - x, where {max} represents the maximum real value to the bit sequence. In the example above the maximum is 255 (8 bits), so ~77 = 255 - 77.
For signed values, it will work as the formula: ~x = -x - 1, resulting in the number necessary to push the original value to -1.

All the bitwise operators with use two sequences, just work for two sequences of the same length. But if one sequences has less bits, it will be filled with 0 at the left until reach the same length of the bigger sequence. For example: 01010100 & 111 is in reality the operation: 01010100 & 00000111. Remember that 0 bits on the left doesn't change anything, just as the real numbers.

Using these simple operations we can really make incredible routines. The best thing is that binaries are stupidly fast with any language, but we already talked about that. Soon we'll see some real examples of binaries in our day work. Before it, we need to see the Bit Shift operations, they are like a complementation to Bitwise operators.


Bit Shifts Operators

top
Exist two kind of shifts: left and right. Simple as that. Using this bit shift you'll push the bits in a byte set, that means, if your data type has 1 byte, bits will be shifted inside only 1 byte, if your data type has 4 bytes, bits will be shifted inside these 4 bytes.

Bit Shift Operators
LEFT-SHIFT: <<

Represented by <<, it pushes all bits to the left. The bit in the very left will be discarded and a new one filled with 0 will be placed in the very right. For example:

   01001101 <<
   --------
   10011010

The example above with pseudo-code notation: 77 << = 154.

LEFT-SHIFT: >>

Represented by >>, it pushes all bits to the right. The bit in the very right will be discarded and a new one filled with 0 will be placed in the very left. For example:

   01001101 >>
   --------
   00100110

The example above with pseudo-code notation: 77 >> = 38.

The Bit Shift operator should be always in the right side. You also can specify multiple bits to shift by placing the number to shift at the right of the operator. For example:

Shifting Multiple Bits
.
   01001101 << 4
   --------
   11010000

   01001101 >> 3
   --------
   00001001
.

Oh right, my friends, these are all the binary operators. Now with these very simple operations we'll make really great things, including in our daily work.


Real Applications to the Binaries

top
I want to show you examples that we are used to use in our daily-work, some examples that can boost your application's performance without hard re-coding, just simple changes. So my deal with you is: only real examples. Let's go.


Even/Odd Routine

top
Let's start with a very simple example. Imagine a routine that you need to fill lines in a table, you want to fill those lines in pairs, one colored line and the one with blank. A simple way would be to count the rows and on each odd number row you fill in with color and on each even number row you leave it blank. Then we could create a routine to check the result of a division by 2, but the fastest way is using binaries. A binary routine could be something like this:

Binary Routine to Check Even Odd
.
// Assume the numberOfRows variable was set.
for (i = 0; i < numberOfRows; ++i)
{
    if (i&1)
    {
        // The "i" is an odd number.
    }
    else
    {
        // The "i" is an even number.
    }
}
.

I love this routine. It makes me think about all the implicit rules that are within binary numbers. In the example above, what we are actually doing is:

00000001 & XXXXXXXX

As only the first bit has a value equals to 1, the only way to create an odd number is by using that bit. So, by using the AND operator, we check if the current "i" number makes use of that bit or not. In affirmative case, the number is odd, otherwise, it will be an even number. Very simple!


Working With Colors

top
Binary + Colors is one of the most popular topics. Almost all Image Softwares make use of binary routines to transform the images. For example, Photoshop makes all its image effects with binary changes in the pixel level.

The first thing about the binary + colors is to understand what is an hexadecimal number. An hexadecimal value is represented by 0xN (always need "0x" before the number), where N has 16 different values: 0-9 + A-F. So the real values to an hexadecimal number is:

0x0 = 0
0x1 = 1
0x2 = 2
0x3 = 3
0x4 = 4
0x5 = 5
0x6 = 6
0x7 = 7
0x8 = 8
0x9 = 9
0xA = 10
0xB = 11
0xC = 12
0xD = 13
0xE = 14
0xF = 15

Now we can place two hexadecimal numbers side by side, one will work like our "unit place" and the other as our "decimal place". So the first has 16 possible values and the second one has other 16 values, totalizing 256 possible values. Wait, wait, wait... 256? It's a Byte, right? So we can use every combination of two hexadecimal numbers (0xNN) to represent a byte!

Pair of hexadecimal numbers.

Pair of hexadecimal numbers.

Just to make this step clear, here is some values using double hexadecimal numbers.

0x00 = 0      (0 x 16 + 0)
0x10 = 16     (1 x 16 + 0)
0x20 = 32     (2 x 16 + 0)
0x21 = 33     (2 x 16 + 1)
0x22 = 34     (2 x 16 + 2)
0xA0 = 160    (10 x 16 + 0)
0xA9 = 169    (10 x 16 + 9)
0xAA = 170    (10 x 16 + 10)
0xAB = 171    (10 x 16 + 11)
0xF0 = 240    (15 x 16 + 0)
0xFF = 255    (15 x 16 + 15)

Well, by using 32 bits platforms, we can use 4 bytes to describe a color, if we think in a RGB spectrum + Alpha we can reserve 1 byte to each color channel (R,G,B,A). At this way, each color channel has 256 possible values, right? We can represent a byte with a double hexadecimal number in the form 0xNN. So we could represent a full pixel with RGBA informations with an hexadecimal combination like this 0xNNNNNNNN, or using a "channel representation" 0xRRGGBBAA.

As a legacy from old times, the alpha processing doesn't has a pattern, so each software made its own implementation, some make 0xRRGGBBAA, others make 0xAARRGGBB and others even use 0xRRGGBB + 0xAA. Until today we don't have an universal format to alpha, so let's focus only in 0xRRGGBB to our study purposes.

Now, to isolate each color channel is very simple:

Isolating Color Channels
.
unsigned int color = 0x99AA44

unsigned char redChannel = color >> 16;
unsigned char greenChannel = ( color >> 8 ) & 0xFF;
unsigned char blueChannel = color & 0xFF;
.

It's very simple understand what happened. If we think in a binary representation (bits), the "color" variable above could be something like: RRRRRRRRGGGGGGGGBBBBBBBB. Then, when we move sixteen bits to the right, the resulting binary is:

RRRRRRRRGGGGGGGGBBBBBBBB >> 16 = RRRRRRRR

Similar we move eight bits to get the green channel, but at this time the green channel is not isolated yet, the red channel remains in the result. Then, to isolate the green channel we use the AND bitwise with 0xFF, which is a byte representing the real value 255, by doing so, we discard all the red channel information, remaining only with the green channel:

RRRRRRRRGGGGGGGGBBBBBBBB >> 8 = RRRRRRRRGGGGGGGG & 11111111 = GGGGGGGG

And finally we make almost the same to the blue channel:

RRRRRRRRGGGGGGGGBBBBBBBB & 11111111 = BBBBBBBB

If you get the pixel data from an image file, it could also has the alpha information pre-multiplied (ARGB). So to correctly extract the color channels you could make the & 0xFF step even to the red channel. If the alpha is post-multiplied (RGBA), we must to push 8 bits more:

Isolating Color Channels Avoiding Alpha
.
unsigned int color = "one pixel data"

// To pixel data in format ARGB
unsigned char redChannel = ( color >> 16 ) & 0xFF;
unsigned char greenChannel = ( color >> 8 ) & 0xFF;
unsigned char blueChannel = ( color >> 0 ) & 0xFF;

// To pixel data informat RGBA
unsigned char redChannel = ( color >> 24 ) & 0xFF;
unsigned char greenChannel = ( color >> 16 ) & 0xFF;
unsigned char blueChannel = ( color >> 8 ) & 0xFF;
.

Once you have access to the color channel of each pixel you can make almost everything with the image color. You can exclude or replace a color channel, transform it into a grey scale image, change the colors individually, well... look at the Photoshop, all that effects was made with this principle. Obviously to produce more complex effects, like a blur or motion blur you need make many passes through every pixel and depending on the image size (width and height), this could be a very expensive task.

I've wrote about how to reduce the color range of an image using the binary approach. It was on my last OpenGL tutorial, you can check it by clicking here.

All the informations on this topic, binary + color, are about the most popular image formats, which make use of 8 bits per channel. But, there are some image format that support more, like 16 bits/channel or even 32 bits/channel. Those high formats support an incredible range of colors, they are often used to print formats, which uses CMYK instead RGBA color formats. But if you need to work with one of that "high images", the code is almost the same, just changing some numbers of bit shifts. To our virtual world, assume all images will always use 8 bits/channel. The following image shows the supported bits per channel for each of the popular image formats. Files that support 32 bits/channel also support 16 and 8, files that support 16 bits/channel also support 8:

Supported bits/channel on the most popular image formats.

Supported bits/channel on the most popular image formats.


Files are Bits

top
Everything was made with binaries. All the files that we use are in a low-level, only bits. So we can parse and extract information from any file. But to do that, you must know the bits organization in that file. This is the principle of file security and encryption. If you make a "coder" and a "decoder" to reorganize the bits inside a file, that file will be now encrypted. More sophisticated techniques also include fake bits or even create a "key" inside the file to recreate the file with a completely new structure. But in depth, everything are bits and can be read.

Usually the popular file formats have their bits organizations largely exposed on documents and specifications. So we can learn to read and write those formats.

Here is a list with the greatest website with file format specifications:

You also have 2 other choices: search in wikipedia for the file format you are looking for or even go to the page of the company/creator which made the file you are looking for. If you don't know who is the creator, you can go to this page: http://whatis.techtarget.com and search for the file extension, when you click you'll see a link to the official page of that format, if exist.

Talking about the file formats, in general, the great majority of the file creators try to organize the informations in 4 bytes, even those informations that need less than 4. This is a good practice to make the file more organized and more readable. But it's not a rule, some files could fill many informations into 4 byte.

Our high level programming languages often offer a class, or some classes, to work with binary files. So you can easily use any data type and that class will create a binary stream for you, forming a great array of bits. After put all informations on the files, you just save it, or send via internet, anyway. I'll show you a little piece of code using Objective-C (Cocoa):

Creating a Binary File with Objective-C
.
// Creates some generic data to save into a file.
// As in C language the strings are made with an array of chars
// we must to know the length of the string to calculate
// the number of bits necessary to write it.
float floatNumber = 15.2;
char *charString = "this is a string";
int strLength = strlen(charString) + 1;

// Initializes the Cocoa class that deals with binaries.
NSMutableData *data = [[NSMutableData alloc] init];

// Inserts the data to save. This order is very important.
[data appendBytes:&floatNumber length:sizeof(float)];
[data appendBytes:&strLength length:sizeof(int)];
[data appendBytes:&charString length:sizeof(char) * stringLength];

// Saves the file.
[data writeToFile:@"/path/to/save/file.test" atomically:YES];

// Frees the memory allocated.
[data release];
.

Even if you are not familiar to Objective-C, you can understand the steps involved here. Using a standard C libraries, we also can use fopen function with the parameter "b" to open or write a file in the binary format.

The most important thing to understand is the order and length of each information in a binary file. In the example above, if we change the order of the calls to "appendBytes:length:" the resulting file will be absolutely different. The following image shows the resulting file from the code above and another file resulting from one little change in the code:

Every file is a stream of bits.

Every file is a stream of bits.

As you can see, changing the order of one little information drives us into a whole new file. The image above shows the binary file opened in an "Hexadecimal Editor". This kind of application show to us the binary format of any file, separating the data in 4 bytes. You remember how to represent a byte using hexadecimal numbers, right? 0xNNNNNNNN represents 4 bytes.

To read the binary file, you need follow the inverse path: selecting a set of bits by time and parsing it appropriately. For example, if you know the next 4 bytes represents a float data, parse it in a float data type, if you know the next byte represents a single char, parse it into a char and so on. Here is an important thing, to parse an information with mutable size, like a string, you need to know the length of the data in first place. So every time we need save a data with mutable size, we reserve some bits (usually 32 bits) to hold the length of the data that comes next.

Using the same example above, we could write to open the saved file:

Reading a Binary File with Objective-C
.
// Creates the variables to receive the data from binary file.
float floatNumber;
char *charString;
int strLength;

// Creates a location/pointer to guide the position of the bytes inside the binary file.
int lastLocation = 0;

// Initializes the Cocoa class that deals with binaries.
NSData *data = [[NSData alloc] initWithContentsOfFile:@"/path/to/save/file.test"];

// Retrieves the floating number from the file and updates the location/pointer.
[data getBytes:&floatNumber range:NSMakeRange(lastLocation, sizeof(float))];
lastLocation += sizeof(float);

// Retrieves the string length from the file and updates the location/pointer.
[data getBytes:&strLength range:NSMakeRange(lastLocation, sizeof(int))];
lastLocation += sizeof(int);

// Allocates the necessary memory to receive the string and retrieves the data.
charString = malloc(sizeof(char) * strLength);
[data getBytes:&charString range:NSMakeRange(lastLocation, sizeof(char) * strLength)];

// Frees the memory allocated.
[data release];
.

Working with binary files we can make great thing on our application. We can improve the security, we can create binary files to hold informations or even open the binary files of other softwares. The performance and size is really amazing! Few kilobytes can hold a bunch of informations.


Conclusion

top
Well, as you can imagine now, we can use binaries to many things: to improve our application's performance, for security improvements, for read and write files, the binaries are useful for many things. My last advice is: don't be shy, use and abuse of binaries! After I learned about the binaries I waited many months until I really started to explore them. Don't do the same, explore the binary power right now, as in The Matrix (movie), you'll start to see the bits behind the world! ;)

So, let's remember from everything:

  • Bits are the most basic representation of the electrical impulses on our computer boards.
  • A Bit can be 0 or 1.
  • One Byte has 8 Bits. Totalizing 256 different possibilities.
  • Bytes can be combined to form powerful platforms: 16 bits, 32 bits, 64 bits and so on.
  • Bitwise operators (&, |, ^ and ~) generate new bits with some changes on the original bit(s).
  • Bit Shifts operators (>> and <<) generate new bits changing the position of the original bit(s).
  • One pair of hexadecimal values can represent one byte.
  • Every file on any computer is a binary file and can be read if you know the bit sequence.

Very well my friends, this is all about the binaries. As you saw, they are really simple, very simple and in reason of this simplicity the binaries are so powerful. If you have any doubt, just ask.

Thanks for reading,

See you soon.


Viewing all articles
Browse latest Browse all 12

Trending Articles