Most active commenters
  • kazinator(3)

←back to thread

27 points andwati | 11 comments | | HN request time: 0.811s | source | bottom
1. MrBuddyCasino ◴[] No.45905428[source]
What first confused me about endianness is that it is about byte order, not bit order. The latter would have seemed more logical, or is this just me?
replies(5): >>45905811 #>>45906279 #>>45906495 #>>45907705 #>>45907834 #
2. andwati ◴[] No.45905811[source]
Learning this initially was confusing for me too, aren't we arranging bits?
replies(1): >>45907944 #
3. cobbal ◴[] No.45906279[source]
Little endian does appear strange at first, but if you consider the motivation it makes a lot of sense.

Little endian's most valuable property is that an integer stored at an address has common layout no matter the width of the integer. If I store an i32 at 0x100, and then load an i16 at 0x100, that's the same as casting (with wrapping) an i32 to an i16 because the "ones digit" (more accurately the "ones byte") is stored at the same place for both integers.

Since bits aren't addressable, they don't really have an order in memory. The only way to access bits is by loading them into a register, and registers don't meaningfully have an endianness.

replies(1): >>45908034 #
4. jojomodding ◴[] No.45906495[source]
You can't address individual bits. There is no way of telling if the LSBit is "left" or "right" of the MSBit. So endianness can't be about that.

For bytes, you can distinguish them, as you can look at the individual bytes produced from a larger-than-byte store.

replies(1): >>45907092 #
5. tadfisher ◴[] No.45907092[source]
Your CPU (probably) has left and right variants for shift and rotate operations, which is certainly an avenue for confusion. There's a "logical" bit order that these operations follow, which starts with the MSBit and ends with the LSBit, even when the physical connections are all parallel and don't really define a physical bit order.
replies(2): >>45907513 #>>45907877 #
6. NobodyNada ◴[] No.45907513{3}[source]
> There's a "logical" bit order that these operations follow, which starts with the MSBit and ends with the LSBit

Well, normally when bits are numbered, "bit 0" is the least significant bit. The MSB is usually written on the left, (such as for left and right shifts), but that doesn't necessarily make it "first" in my mind.

7. ◴[] No.45907705[source]
8. kazinator ◴[] No.45907834[source]
The concept of order can only matter down to the units that are addressable.

Bits are typically not addressable, therefore do not have endiannness.

Bits are manipulated by special instructions, and those instructions are tied to arithmetic identities, due to the bits being interpreted as a binary number: like that a shift left is multiplication by 2.

In many instruction sets, the shift is a positive amount, and whether it is left or right is a different instruction. If it were the case that shifting one way is positive and the other way negative, then you have a kind of endiannness in that one machine uses positive for multiplication by powers of two, whereas another one for division. That would not result in an incompatible storage format though.

When data is transferred between machines as a sequence of bytes, there is a bit order in question, but it is taken care of by the compatibility of the data links.

Classic Ethernet is little endian at the bit level: the baseband pulses that represent the bit of a byte are sent into coax cable least-significant-bit first. RS-232 serial communication is the same: least significant bit first.

I think I²C is an example of a data link / physical protocol that is most-significant-bit first. So if you somehow hooked up an RS-232 end to I²C and got the communication to work, the bytes would be reversed.

We rarely, if ever, see bit endian effects because nobody does that --- transmit bytes between incompatible data links. If won't work for other reasons, like different framing bits, signaling conventions, voltages, speeds, synchronization methods, checksums, ...

Endianness of bits shows up in some data formats which pack individual bitfields of variable length.

Bitfields in C structures reveal bit endianness to some extent. What typically happens is that on a big endian target, bit fields are packed into the most significant bit of the underlying "cell" first. E.g.

   struct { unsigned a : 1, b : 1 };
the underlying cell might be the size of an int, like 32 bits. So where in the cell do "a" and "b" go? What you see under GCC is that on a big endian target, b will go to the most significant bit of the underlying storage cell, and b to the second most significant one. Whereas on little endian, a goes to the least significant bit, and b to the second least. In both cases, the bits map to the first byte, at the lowest address.

So in a certain sense, the allocation of members in C, as such, is little endian: the earlier struct members go to the lowest address, regardless of machine endian. It is probably because of that the bit order follows. Since putting bitfield a at the lowest address, as mandated by C field layout order, means that it has to go into the first byte, and that first byte is the most significant byte under big endian, it makes sense that the bit goes into the most significant bit position, for consistency.

That way we only have two possibilities to deal with for, say, a memory mapped status register:

    struct port_status_word {
  #if HAVE_BIG_ENDIAN
      unsigned transmit_ready : 1;
      unsigned data_received : 1;
      unsigned carrier_present : 1;
      // [ ... 29 more]
  #else
      // [ ... 29 more]
      unsigned carrier_present : 1;
      unsigned data_received : 1;
      unsigned transmit_ready : 1;
  #endif
   };
If we had separate byte and bit order, we would need two levels of #if nesting and four possibilities, which is even more ugly.
9. kazinator ◴[] No.45907877{3}[source]
The Common Lisp ASH (arithmetic shift) instruction has positive shifts for left, nefgative for right.

But even if machines were like this, it would not cause any interoperatibility issue. Because it is the data links between machines which ensure that bits are transmitted and received in the correct order, not the semantics of machine instructions.

It would be something to worry about when translating code from one language or instruction set to another.

Data link and physical protocols ensure that when you transmit byte with a certain decimal value like 65 (ASCII 'A') it is received as 65 on the other end.

The bits are "addressable" at the data link level, because the hardware has to receive a certain bit first, and the one after that next, and so on.

10. kazinator ◴[] No.45907944[source]
The words are divided into bytes. The bytes are rearranged, but the bits stay the same. The bits are not addressable and so represent pure binary values.

For instance given the word DEADBEEF, the least significant byte is EF.

That is a specific binary value: the value 239.

That value stays the same whether the bytes are EF BE AD DE in memory, or DE AD BE EF.

EF is just 239. We don't think about reversing the bits; they are not addressable. They have an abstract order determined by the binary system. The most significant bit of the value contributes 128 and so on.

The order matters when the bits have to be transmitted over a wire to another machine. Then we have to decide: do we transmit the low bit of EF first, or the high bit 1? If the two sides of the data link are inconsistent, then one side transmits 11101111 and the other receives 11110111, which is F7.

11. IshKebab ◴[] No.45908034[source]
I'm not sure I've ever seen that actually come in to play. Little Endian is obviously the best Endian, but I don't think that argument really makes sense.

The most obvious argument is that little Endian is clearly the most natural order - the only reason to use Big Endian is to match the stupid human history of mixing LTR text with RTL numbers.

I've seen one real technical reason to prefer little endian (can't remember what it was tbh but it was fairly niche) and I've never seen any technical reasons to prefer big endian ("it's easier to read in a hex editor" doesn't count).