3.7 Big and Little Endians

When talking about whole numbers (integers) we should distinguish between their value (such as 123) and their written form that we would use when writing the number on a piece of paper, such as 123.

The written form of a number is composed of digits, arranged in certain order. We all know that the ordering of the digits in the written form of a number is important: if we write 123 we are referring to a different value than if we write 321. The mathematical reason for this is that depending on the position they occupy in the written form, each digit contributes with a different “weight” to the total value of the number. This is always the case, regardless of the numerical base used to denote the number.

For example, the value of the number 123 (whose written form is 123) is calculated as 1*10^2+2*10^1+3*10^0. If we swap the last two digits in the written form of the number, we have 1*10^2+3*10^1+2*10^0, which results in a different value: 132. When we consider other numerical bases, the bases in the polynomial change accordingly, but the correspondence between written form and value stands: for example, the value of 0x123 is calculated as 1*16^2+2*16^1+3*16^0.

The “higher” a digit is in the polynomial, the more significant it is, i.e. the more weight it has on the value of the number where it appears. In the written number 123, for example, the digit 1 is the most significant digit of the number, and the digit 3 is the least significant digit.

This distinction between the written form of a number and its value is very important. Just like in certain languages letters are read right-to-left (Arabic) or even down-to-up (Japanese) we could certainly conceive a language in which the digits of numbers were arranged from right-to-left instead of left-to-right. In such a language the written representation of 123 would be 321, not 123. In other words: the least significant digit would come first, not last, in the written form of the number.

Now when it comes to store numbers in computers, rather than writing them on a paper, the role of the paper is played by the computer’s memory, be it ephemeral (like RAM) or persistent (like a spinning hard disk or a Flash memory), which is organized as a sequence of bytes. Since we are composing numbers with bytes, it makes sense to have each byte to play the role of a digit in the written form of the bigger number. Since bytes can have values from 0 to 255, the base is 256. But what is the “written form” for our byte-composed numbers?

In the last section we tried to compose bigger integers by concatenating bytes together and interpreting the result. In doing so, we assumed (quite naturally) that in the written form of the resulting integer the bytes are ordered in the same order than they appear in the file, i.e. we assume that the written form of the number b1*256^2+b2*256^1+b3*256^0 would be b1b2b3, where b1, b2 and b3 are bytes. In other words, given a written form b1b2b3, b1 would be the most significant byte (digit) and b3 would be the least significant byte (digit). In our world of IO spaces, the “written form” is the disposition of the bytes in the IO space (file, memory buffer, etc) being edited.

That interpretation of the written form is exactly what the bit-concatenation operator implements:

(poke) dump :from 0#B :size 3#B
76543210  0011 2233 4455 6677 8899 aabb ccdd eeff  0123456789ABCDEF
00000000: 7f45 4c                                  .EL
(poke) var b1 = byte @ 0#B
(poke) var b2 = byte @ 1#B
(poke) var b3 = byte @ 2#B
(poke) b1:::b2:::b3
(uint<24>) 0x7f454c

However, much like in certain human languages the written form is read from right to left, some computers also read numbers from right to left in their “written form”. Actually, turns out that most modern computers do it like that. This means that, in these computers, given the written form b1b2b3 (i.e. given a file where b1 comes first, followed by b2 and then b3) the most significant byte is b3 and the least significant byte is b1. Therefore, the value of the number would be b3*256^2+b2*256^1+b3*256^0.

So, given the written form of a bigger number b1b2b3 (i.e. some ordering of bytes implied by the file they are stored in) there are at least two ways to interpret them to calculate the value of the number. When the written form is read from left to right, we talk about a big endian interpretation. When the written form is read from right to left, we talk about a little endian interpretation.

Given the first three bytes in foo.o, we can determine the value of the integer composed of these three bytes in both interpretations:

(poke) b1:::b2:::b3
(uint<24>) 0x7f454c
(poke) b3:::b2:::b1
(uint<24>) 0x4c457f

Remember how the type specifier byte is just a synonym of uint<8>, and how we can use type specifiers like uint<24> and uint<32> to map bigger integers? When we do that, like in:

(poke) uint<24> @ 0#B
(uint<24>) 0x7f454c

Poke should somehow decide what kind of interpretation to use, i.e. how to read the “written form” of the number. As you can see from the example, poke uses the left-to-right interpretation, or big-endian, by default. But you can change it using a new dot-command: .set endian:

(poke) .set endian little
(poke) uint<24> @ 0#B
(uint<24>) 0x4c457f

The currently used interpretation (also called endianness) is shown if you invoke the dot-command without an argument4:

(poke) .set endian
little

Different systems use different endianness. Into a given system, it is to be expected that most files will be encoded following the same conventions. Therefore poke provides you a way to set the endianness to whatever endianness is in the system. You do it this way:

(poke) .set endian host

Footnotes

(4)

This also applies to the other .set commands