Most active commenters
  • Spivak(3)

←back to thread

182 points Twirrim | 36 comments | | HN request time: 0.968s | source | bottom
1. WalterBright ◴[] No.41875254[source]
D made a great leap forward with the following:

1. bytes are 8 bits

2. shorts are 16 bits

3. ints are 32 bits

4. longs are 64 bits

5. arithmetic is 2's complement

6. IEEE floating point

and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!

Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.

replies(3): >>41875486 #>>41875539 #>>41875878 #
2. gerdesj ◴[] No.41875486[source]
"1. bytes are 8 bits"

How big is a bit?

replies(6): >>41875621 #>>41875701 #>>41875768 #>>41876060 #>>41876149 #>>41876238 #
3. cogman10 ◴[] No.41875539[source]
Yeah, this is something Java got right as well. It got "unsigned" wrong, but it got standardizing primitive bits correct

byte = 8 bits

short = 16

int = 32

long = 64

float = 32 bit IEEE

double = 64 bit IEEE

replies(2): >>41875597 #>>41875634 #
4. jltsiren ◴[] No.41875597[source]
I like the Rust approach more: usize/isize are the native integer types, and with every other numeric type, you have to mention the size explicitly.

On the C++ side, I sometimes use an alias that contains the word "short" for 32-bit integers. When I use them, I'm explicitly assuming that the numbers are small enough to fit in a smaller than usual integer type, and that it's critical enough to performance that the assumption is worth making.

replies(3): >>41875695 #>>41875827 #>>41875847 #
5. poincaredisk ◴[] No.41875621[source]
A bit is either a 0 or 1. A byte is the smallest addressable piece of memory in your architecture.
replies(2): >>41875706 #>>41875737 #
6. josephg ◴[] No.41875634[source]
Yep. Pity about getting chars / string encoding wrong though. (Java chars are 16 bits).

But it’s not alone in that mistake. All the languages invented in that era made the same mistake. (C#, JavaScript, etc).

replies(3): >>41875696 #>>41876204 #>>41876445 #
7. Jerrrrrrry ◴[] No.41875695{3}[source]
hindsight has its advantages
8. paragraft ◴[] No.41875696{3}[source]
What's the right way?
replies(2): >>41875771 #>>41875782 #
9. CoastalCoder ◴[] No.41875701[source]
> How big is a bit?

A quarter nybble.

10. elromulous ◴[] No.41875706{3}[source]
Technically the smallest addressable piece of memory is a word.
replies(2): >>41876026 #>>41876056 #
11. Nevermark ◴[] No.41875737{3}[source]
Which … if your heap always returns N bit aligned values, for some N … is there a name for that? The smallest heap addressable segment?
12. thamer ◴[] No.41875768[source]
This doesn't feel like a serious question, but in case this is still a mystery to you… the name bit is a portmanteau of binary digit, and as indicated by the word "binary", there are only two possible digits that can be used as values for a bit: 0 and 1.
13. WalterBright ◴[] No.41875771{4}[source]
UTF-8

When D was first implemented, circa 2000, it wasn't clear whether UTF-8, UTF-16, or UTF-32 was going to be the winner. So D supported all three.

14. Remnant44 ◴[] No.41875782{4}[source]
utf8, for essentially the reasons mentioned in this manifesto: https://utf8everywhere.org/
replies(1): >>41875952 #
15. jonstewart ◴[] No.41875827{3}[source]
<cstdint> has int8_t, uint8_t, int16_t, uint16_t, int32_t, uint32_t, int64_t, and uint64_t. I still go back and forth between uint64_t, size_t, and unsigned int, but am defaulting to uint64_t more and more, even if it doesn't matter.
16. kazinator ◴[] No.41875847{3}[source]
> you have to mention the size explicitly

It's unbelievably ugly. Every piece of code working with any kind of integer screams "I am hardware dependent in some way".

E.g. in a structure representing an automobile, the number of wheels has to be some i8 or i16, which looks ridiculous.

Why would you take a language in which you can write functional pipelines over collections of objects, and make it look like assembler.

replies(2): >>41875953 #>>41876035 #
17. Laremere ◴[] No.41875878[source]
Zig is even better:

1. u8 and i8 are 8 bits.

2. u16 and i16 are 16 bits.

3. u32 and i32 are 32 bits.

4. u64 and i64 are 64 bits.

5. Arithmetic is an explicit choice. '+' overflowing is illegal behavior (will crash in debug and releasesafe), '+%' is 2's compliment wrapping, and '+|' is saturating arithmetic. Edit: forgot to mention @addWithOverflow(), which provides a tuple of the original type and a u1; there's also std.math.add(), which returns an error on overflow.

6. f16, f32, f64, f80, and f128 are the respective but length IEEE floating point types.

The question of the length of a byte doesn't even matter. If someone wants to compile to machine whose bytes are 12 bits, just use u12 and i12.

replies(3): >>41876011 #>>41876015 #>>41876480 #
18. josephg ◴[] No.41875952{5}[source]
Yep. Notably supported by go, python3, rust and swift. And probably all new programming languages created from here on.
19. pezezin ◴[] No.41875953{4}[source]
If you don't care about the size of your number, just use isize or usize.

If you do care, then isn't it better to specify it explicitly than trying to guess it and having different compilers disagreeing on the size?

replies(1): >>41875968 #
20. kazinator ◴[] No.41875968{5}[source]
A type called isize is some kind of size. It looks wrong for something that isn't a size.
replies(1): >>41876423 #
21. __turbobrew__ ◴[] No.41876011[source]
This is the way.
22. Spivak ◴[] No.41876015[source]
How does 5 work in practice? Surely no one is actually checking if their arithmetic overflows, especially from user-supplied or otherwise external values. Is there any use for the normal +?
replies(1): >>41876229 #
23. asveikau ◴[] No.41876026{4}[source]
Depends on your definition of addressable.

Lots of CISC architectures allow memory accesses in various units even if they call general-purpose-register-sized quantities "word".

Iirc the C standard specifies that all memory can be accessed via char*.

24. Spivak ◴[] No.41876035{4}[source]
Is it any better calling it an int where it's assumed to be an i32 and 30 of the bits are wasted.
25. Maxatar ◴[] No.41876056{4}[source]
I don't think the term word has any consistent meaning. Certainly x86 doesn't use the term word to mean smallest addressable unit of memory. The x86 documentation defines a word as 16 bits, but x86 is byte addressable.

ARM is similar, ARM processors define a word as 32-bits, even on 64-bit ARM processors, but they are also byte addressable.

As best as I can tell, it seems like a word is whatever the size of the arithmetic or general purpose register is at the time that the processor was introduced, and even if later a new processor is introduced with larger registers, for backwards compatibility the size of a word remains the same.

26. nonameiguess ◴[] No.41876060[source]
How philosophical do you want to get? Technically, voltage is a continuous signal, but we sample only at clock cycle intervals, and if the sample at some cycle is below a threshold, we call that 0. Above, we call it 1. Our ability to measure whether a signal is above or below a threshold is uncertain, though, so for values where the actual difference is less than our ability to measure, we have to conclude that a bit can actually take three values: 0, 1, and we can't tell but we have no choice but to pick one.

The latter value is clearly less common than 0 and 1, but how much less? I don't know, but we have to conclude that the true size of a bit is probably something more like 1.00000000000000001 bits rather than 1 bit.

27. basementcat ◴[] No.41876149[source]
A bit is a measure of information theoretical entropy. Specifically, one bit has been defined as the uncertainty of the outcome of a single fair coin flip. A single less than fair coin would have less than one bit of entropy; a coin that always lands heads up has zero bits, n fair coins have n bits of entropy and so on.

https://en.m.wikipedia.org/wiki/Information_theory

https://en.m.wikipedia.org/wiki/Entropy_(information_theory)

replies(1): >>41876245 #
28. jeberle ◴[] No.41876204{3}[source]
Java strings are byte[]'s if their contents contain only Latin-1 values (the first 256 codepoints of Unicode). This shipped in Java 9.

JEP 254: Compact Strings

https://openjdk.org/jeps/254

29. dullcrisp ◴[] No.41876229{3}[source]
You think no one checks if their arithmetic overflows?
replies(1): >>41876357 #
30. dullcrisp ◴[] No.41876238[source]
At least 2 or 3
31. fourier54 ◴[] No.41876245{3}[source]
That is a bit in information theory. It has nothing to do with the computer/digital engineering term being discussed here.
replies(1): >>41876484 #
32. Spivak ◴[] No.41876357{4}[source]
I'm sure it's not literally no one but I bet the percent of additions that have explicit checks for overflow is for all practical purposes indistinguishable from 0.
33. pezezin ◴[] No.41876423{6}[source]
Then just define a type alias, which is good practice if you want your types to be more descriptive: https://doc.rust-lang.org/reference/items/type-aliases.html
34. davidgay ◴[] No.41876445{3}[source]
Java was just unlucky, it standardised it's strings at the wrong time (when Unicode was 16-bit code points): Java was announced in May 1995, and the following comment from the Unicode history wiki page makes it clear what happened: "In 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. ..."
35. notfed ◴[] No.41876480[source]
Same deal with Rust.
36. sirsinsalot ◴[] No.41876484{4}[source]
This comment I feel sure would repulse Shannon in the deepest way. A (digital, stored) bit, abstractly seeks to encode and make useful through computation the properties of information theory.

Your comment must be sarcasm or satire, surely.