C++ proposal: There are exactly 8 bits in a byte

1. donatj ◴[17 Oct 24 23:55 UTC] No.41875031[source]▶

So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.

To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?

replies(6): >>41875041 #>>41875052 #>>41875110 #>>41875147 #>>41875204 #>>41875211 #

2. dplavery92 ◴[17 Oct 24 23:56 UTC] No.41875041[source]▶

>>41875031 (TP) #

Eight is a nice power of two.

replies(1): >>41875063 #

3. wvenable ◴[17 Oct 24 23:58 UTC] No.41875052[source]▶

>>41875031 (TP) #

I'm not sure why you think being able to store values from -512 to +511 is more logical than -128 to +127?

replies(1): >>41875059 #

4. donatj ◴[18 Oct 24 00:00 UTC] No.41875059[source]▶

>>41875052 #

Buckets of 10 seem more regular to beings with 10 fingers that can be up or down?

replies(3): >>41875079 #>>41876677 #>>41877245 #

5. donatj ◴[18 Oct 24 00:01 UTC] No.41875063[source]▶

>>41875041 #

Can you explain how that's helpful? I'm not being obtuse, I just don't follow

replies(4): >>41875100 #>>41875101 #>>41875319 #>>41875462 #

6. wvenable ◴[18 Oct 24 00:03 UTC] No.41875079{3}[source]▶

>>41875059 #

I think 8bits (really 7 bits) was chosen because it holds a value closest to +/- 100. What is regular just depends on how you look at it.

7. bonzini ◴[18 Oct 24 00:06 UTC] No.41875100{3}[source]▶

>>41875063 #

It's easier to go from a bit number to (byte, bit) if you don't have to divide by 10.

8. spongebobstoes ◴[18 Oct 24 00:06 UTC] No.41875101{3}[source]▶

>>41875063 #

One thought is that it's always a whole number of bits (3) to bit-address within a byte. It's 3.5 bits to bit address a 10 bit byte. Sorta just works out nicer in general to have powers of 2 when working on base 2.

replies(1): >>41875607 #

9. bryanlarsen ◴[18 Oct 24 00:08 UTC] No.41875110[source]▶

>>41875031 (TP) #

I'm fairly sure it's because the English character set fits nicely into a byte. 7 bits would have have worked as well, but 7 is a very odd width for something in a binary computer.

10. zamadatix ◴[18 Oct 24 00:14 UTC] No.41875147[source]▶

>>41875031 (TP) #

If you're ignoring what's efficient to use then just use a decimal data type and let the hardware figure out how to calculate that for you best. If what's efficient matters then address management, hardware operation implementations, and data packing are all simplest when the group size is a power of the base.

11. knome ◴[18 Oct 24 00:26 UTC] No.41875204[source]▶

>>41875031 (TP) #

likely mostly as a concession to ASCII in the end. you used a typewriter to write into and receive terminal output from machines back in the day. terminals would use ASCII. there were machines with all sorts of smallest-addressable-sizes, but eight bit bytes align nicely with ASCII. makes strings easier. making strings easier makes programming easier. easier programming makes a machine more popular. once machines started standardizing on eight bit bytes, others followed. when they went to add more data, they kept the byte since code was written for bytes, and made their new registeres two bytes. then two of those. then two of those. so we're sitting at 64 bit registers on the backs of all that that came before.

12. morio ◴[18 Oct 24 00:27 UTC] No.41875211[source]▶

>>41875031 (TP) #

One example from the software side: A common thing to do in data processing is to obtain bit offsets (compression, video decoding etc.). If a byte would be 10 bits you would need mod%10 operations everywhere which is slow and/or complex. In contrast mod%(2^N) is one logic processor instruction.

13. inkyoto ◴[18 Oct 24 00:45 UTC] No.41875319{3}[source]▶

>>41875063 #

Because modern computing has settled on the Boolean (binary) logic (0/1 or true/false) in the chip design, which has given us 8 bit bytes (a power of two). It is the easiest and most reliable to design and implement in the hardware.

On the other hand, if computing settled on a three-valued logic (e.g. 0/1/«something» where «something» has been proposed as -1, «undefined»/«unknown»/«undecided» or a «shade of grey»), we would have had 9 bit bytes (a power of three).

10 was tried numerous times at the dawn of computing and… it was found too unwieldy in the circuit design.

replies(1): >>41875536 #

14. davemp ◴[18 Oct 24 01:07 UTC] No.41875462{3}[source]▶

>>41875063 #

Many circuits have ceil(log_2(N_bits)) scaling wrt to propagation delay/other dimensions so you’re just leaving efficiency on the table if you aren’t using a power of 2 for your bit size.

15. davemp ◴[18 Oct 24 01:22 UTC] No.41875536{4}[source]▶

>>41875319 #

> On the other hand, if computing settled on a three-valued logic (e.g. 0/1/«something» where «something» has been proposed as -1, «undefined»/«unknown/undecided» or a «shade of grey»), we would have had 9 bit bytes (a power of three).

Is this true? 4 ternary bits give you really convenient base 12 which has a lot of desirable properties for things like multiplication and fixed point. Though I have no idea what ternary building blocks would look like so it’s hard to visualize potential hardware.

replies(1): >>41875705 #

16. cogman10 ◴[18 Oct 24 01:37 UTC] No.41875607{4}[source]▶

>>41875101 #

This is basically the reason.

Another part of it is the fact that it's a lot easier to represent stuff with hex if the bytes line up.

I can represent "255" with "0xFF" which fits nice and neat in 1 byte. However, now if a byte is 10bits that hex no longer really works. You have 1024 values to represent. The max value would be 0x3FF which just looks funky.

Coming up with an alphanumeric system to represent 2^10 cleanly just ends up weird and unintuitive.

replies(1): >>41876097 #

17. inkyoto ◴[18 Oct 24 01:56 UTC] No.41875705{5}[source]▶

>>41875536 #

It is hard to say whether it would have been 9 or 12, now that people have stopped experimenting with alternative hardware designs. 9-bit byte designs certainly did exist (and maybe even the 12-bit designs), too, although they were still based on the Boolean logic.

I have certainly heard an argument that ternary logic would have been a better choice, if it won over, but it is history now, and we are left with the vestiges of the ternary logic in SQL (NULL values which are semantically «no value» / «undefined» values).

18. Spivak ◴[18 Oct 24 03:24 UTC] No.41876097{5}[source]▶

>>41875607 #

We probably wouldn't have chosen hex in a theoretical world where bytes were 10 bits, right? It would probably be two groups of 5 like 02:21 == 85 (like an ip address) or five groups of two 0x01111 == 85. It just has to be one of its divisors.

replies(1): >>41878940 #

19. atq2119 ◴[18 Oct 24 05:52 UTC] No.41876677{3}[source]▶

>>41875059 #

Computers are not beings with 10 fingers that can be up or down.

Powers of two are more natural in a binary computer. Then add the fact that 8 is the smallest power of two that allows you to fit the Latin alphabet plus most common symbols as a character encoding.

We're all about building towers of abstractions. It does make sense to aim for designs that are natural for humans when you're closer to the top of the stack. Bytes are fairly low down the stack, so it makes more sense for them to be natural to computers.

20. inkyoto ◴[18 Oct 24 07:56 UTC] No.41877245{3}[source]▶

>>41875059 #

Unless they are Addams who have 10 fingers and 11 toes as it is known abundantly well.

21. shultays ◴[18 Oct 24 12:48 UTC] No.41878940{6}[source]▶

>>41876097 #

  02:21

or instead of digits from 0 to F, the letters would go up V. 85 would be 0x2k I think (2 * 32 + 21)