Most active commenters
  • pvg(4)
  • jessaustin(4)
  • coldtea(4)
  • jrochkind1(3)

←back to thread

2024 points randlet | 32 comments | | HN request time: 1.706s | source | bottom
Show context
bla2 ◴[] No.17515883[source]
> I don't ever want to have to fight so hard for a PEP and find that so many people despise my decisions.

Leading a large open source project must be terrible in this age of constant outrage :-(

replies(9): >>17515955 #>>17515972 #>>17516193 #>>17516427 #>>17516776 #>>17516884 #>>17517282 #>>17517716 #>>17517821 #
symmitchry ◴[] No.17515972[source]
I'm a little confused though, by his feelings here. Why did he feel the need to "fight so hard for a PEP" if it was so controversial, and everyone was outraged?

I do understand people's points about "the age of outrage" and "internet 2018" but still: the PEP wasn't generally accepted as being a fantastic improvement, so why did he feel the need to fight so hard for it?

replies(5): >>17516128 #>>17516129 #>>17516223 #>>17516774 #>>17519017 #
jnwatson ◴[] No.17516128[source]
It was controversial syntax, inline assignment-as-expression. There's always a tension between "keep it simple stupid" and "let's make it better", especially when a large user demographic of Python are non-professional-programmers.

Interestingly, C++ is going through the same process, with lots of great ideas being proposed, but the sum total of them being an even more complicated language (on top of what is probably the most complicated language already).

Python has been successful, IMHO, because Guido has made several brave, controversial calls. Python 3 breakage and async turned out to be prescient, fantastic decisions.

replies(6): >>17516204 #>>17516226 #>>17516681 #>>17517178 #>>17517212 #>>17533584 #
1. cbhl ◴[] No.17516293[source]
I think there are people on both sides of this fence.

Google is still largely Python 2, but my impression is that Facebook actually managed to make the transition to Python 3 after putting in the right presubmit checks.

replies(1): >>17516436 #
2. steveklabnik ◴[] No.17516436[source]
For the discussion about Facebook, see here: https://news.ycombinator.com/item?id=17417201

It seems Python 3 is now dominant but not total, according to TFA.

3. pvg ◴[] No.17516452[source]
This is a particularly ill-chosen thread to deliberately try to re-flame this flamewar. Most threads are.
replies(3): >>17516644 #>>17516760 #>>17517200 #
4. ben509 ◴[] No.17516612[source]
No, the bad decision was treating bytes and strings interchangeably in the first place. 99% of the hardest to fix breakage was due to that, and it was the right call to pay that price all at once.
replies(6): >>17516995 #>>17517386 #>>17517687 #>>17518069 #>>17520956 #>>17528411 #
5. romanows ◴[] No.17516644[source]
Heaven forbid that someone responds to a point made in another post. On a public internet discussion forum. In a discussion about the context and effects of divisive and difficult decisions. /s
replies(2): >>17516675 #>>17516790 #
6. pvg ◴[] No.17516675{3}[source]
Heaven, sadly, cannot forbid poopy responses but we can encourage others (and ourselves) not to post them.
replies(1): >>17517250 #
7. Bahamut ◴[] No.17516760[source]
To be fair, the person stating that Python 3 breakage was a fantastic decision probably should've avoided referring to that as such, as it almost certainly invites disagreement on a controversial topic.
replies(1): >>17516974 #
8. blattimwind ◴[] No.17516790{3}[source]
Python 3 breakage is basically flamebait by now.
9. pvg ◴[] No.17516974{3}[source]
No, that's giving the flamemongers a flamer's veto. It's a perfectly sensible thing to mention when talking about the difficulties of leading the Python project and offering an opinion on how Guido van Rossum handled them. Picking out that one opinion and yelling little more than 'fite me' back at the person is not a perfectly sensible thing.
replies(1): >>17517087 #
10. jessaustin ◴[] No.17516995[source]
The API in 2 is not optimal, but they fixed it the wrong way. As you know, some operations make sense with bytes, and some make sense with character strings. The operations that make sense with character strings would also make sense with bytes when an encoding is specified. Therefore, there should just be a way of annotating bytes with a suggested encoding. Then byte-oriented packages (e.g. those that deal with data sent over an interface like a socket or pipe) could simply ignore the issue of encoding. Whole classes of errors would just disappear for many python coders. Other coders, who do care about encodings and non-ASCII characters, would still get those errors but that would be OK because they would know how to fix those errors.

So yes some breaking change was indicated, but the particular change that was made was the wrong one.

replies(4): >>17517441 #>>17517444 #>>17517465 #>>17522768 #
11. Bahamut ◴[] No.17517087{4}[source]
Then if someone disagrees & an argument erupts, then that should not be surprising - that post is equally as culpable for igniting a well-discussed issue by posting something clearly so opinionated/leaning towards one side of an issue that at this point one should have known better than to even allow a conversation to digress in that direction if that person wants to avoid that discussion.

To ignore that is to straight up deny what can only be described as flamebait.

replies(1): >>17517184 #
12. pvg ◴[] No.17517184{5}[source]
It is not flamebait to say 'I think so-and-so handled a difficult problem well', especially as part of larger point. It's flamy to respond 'u wat m8?'. It's not a complicated thing and there's no 'fairness' in treating these things as the same.
13. coldtea ◴[] No.17517200[source]
>This is a particularly ill-chosen thread to deliberately try to re-flame this flamewar

Apparently it's the right thread to be rude and to assign intentions to people you don't know though?

And all because they dared say their opinion on a subject you're sensitive about?

How about that: people can have any opinion they like on Python 3, including considering it a botched migration process and a ho-hum update. And it's totally legit for them to speak about that. And it's not your place to censor them, or act up any time they express their opinions.

You can either add your arguments, or skip reading their comments. How about that?

replies(1): >>17517519 #
14. coldtea ◴[] No.17517250{4}[source]
Notice how you've only added noise in the discussion, and made a casual comment on something somebody wrote a 10+ comment meta-thread?

Plus rudely assigned intentions ("flamethrower" etc) to others?

15. tialaramex ◴[] No.17517386[source]
I think I mostly agree with you, but it can be unclear to programmers (who are after all the people writing Python code) whether they're dealing with a string, or merely with bytes that happen to look like a string, and this causes hassle.

Until way too recently essentially all Python code couldn't handle basic SSL/TLS certificate validation for Internationalized Domain Names. Once you understand what's going on, this situation is a no brainer: In order to connect to a machine named X, we must have turned X into DNS A-labels that we could look up, and we can treat those as bytes. The certificate must have SAN dnsNames matching its names, and those too are written as A-labels. So we can almost just compare the literal bytes (actually DNS A-labels are "case-insensitive" so we need to handle that, and the asterisk "wild card")

But Python, in its wisdom, defined this API to take a string, and strings, as you observe, aren't bytes. So instead of the above unarguable approach they wasted precious months trying to figure out how best to turn the A-labels from a SAN dnsName into Unicode strings, which isn't even the right problem to solve.

Eventually sanity prevailed: https://bugs.python.org/issue28414

16. ubernostrum ◴[] No.17517441{3}[source]
Then byte-oriented packages (e.g. those that deal with data sent over an interface like a socket or pipe) could simply ignore the issue of encoding.

Long and bitter experience has shown that people who think they can "simply ignore" the "issue" of encoding actually can't. That mindset is mostly a more polite way of saying "people who assume everything is ASCII all the time, or at most an encoding that always has one byte == one character". Those assumptions break sooner or later. I prefer having them break sooner, because I've been the person who had to clean up the mess at an unpleasant time when it was kicked to "later".

Which in turn means Python 3 made the right choice: text is text and bytes are bytes, and you should never ever pretend bytes are text no matter how much you think you'll never run into a case where the assumption fails.

replies(2): >>17517640 #>>17522848 #
17. tialaramex ◴[] No.17517444{3}[source]
Implicitly converting strings into bytes or vice versa means now all your APIs grow an exception "Invalid encoding" / "Can't encode this" / "Can't decode this" / etcetera that you need to deal with.

Making people actually do the conversion has the advantage that when writing their string conversion code they might actually do something with the exception beyond maybe logging it and then pressing on anyway. It also gives you the opportunity to explicitly offer them alternatives like treating everything we can't encode as some sort of replacement character (works well for Unicode, not so much for ASCII), which is way too much to ask of every single function that takes bytes.

replies(1): >>17517968 #
18. jrochkind1 ◴[] No.17517465{3}[source]
> The operations that make sense with character strings would also make sense with bytes when an encoding is specified. Therefore, there should just be a way of annotating bytes with a suggested encoding.

That's the ruby solution. I like the ruby API here. When ruby introduced it in 1.9, it did cause similar upgrade pain, since you weren't used to having your strings tagged with an encoding, and suddenly they all kind of need to be if you want to do anything with them as strings-not-bytes.

As someone else noted would be the result, indeed the result was lots of "incompatible encoding" exceptions.

I think ruby actually has a pretty reasonable API here, but several years on, there are still _plenty_ of developers who don't understand it.

replies(1): >>17518080 #
19. scott_s ◴[] No.17517519{3}[source]
Opinions are fine. But "Did you forgot the /s tag?" is antagonistic. Please don't antagonize.
20. ◴[] No.17517640{4}[source]
21. jerf ◴[] No.17517687[source]
"No, the bad decision was treating bytes and strings interchangeably in the first place."

Show me the 1995 software that gets this right, and I'll show you proof that time travel is possible, by simply showing you that software right back again.

Abstractly, yes, it's true. Concretely, though, it's not a particularly valid criticism.

22. jessaustin ◴[] No.17517968{4}[source]
...which is way too much to ask of every single function that takes bytes.

Sorry if I wasn't clear. I meant to suggest that the byte-functions wouldn't know or do anything about encodings. They just work with bytes. It's the other functions, that take the encoding-annotated bytes (or optionally a "pure" unicode type), that would care about encodings.

23. fgonzag ◴[] No.17518069[source]
Bad decision? UTF was Standarized in 1993, python was first released in 1991. You can't decide to use something that hasn't been invented yet. Back then bytes and string were the same thing. Java did do it right but by 1995 the industry had already seen the problems of differing character sets.
replies(1): >>17522175 #
24. jessaustin ◴[] No.17518080{4}[source]
Sure, there are growing pains with any breaking change. Do you think the ruby 1.8 -> 1.9 transition went more smoothly than the python 2.7 -> 3.x... transition?
replies(2): >>17518536 #>>17518878 #
25. jrochkind1 ◴[] No.17518536{5}[source]
My impression is that it did, yes.
replies(1): >>17520279 #
26. coldtea ◴[] No.17518878{5}[source]
Sure it did. Barely took a few years, and it also had a large performance boost (whereas 3 initially came with regressions until 3.4 or so).

Plus, basic things like Rails were working from the start.

27. jrochkind1 ◴[] No.17520279{6}[source]
But on the other hand, Python's overall popularity has grown (mostly on the back of 'data science' I think), while rubies has shrunk, and python is definitely more popular than ruby at the moment... so python's lesser success at that transition didn't actually matter much in the end?
28. kqr ◴[] No.17520956[source]
I wish I could sufficiently express my frustration for this. It's such a common mistake too. If you design a language with a type for text, and your language has a type called "string" oh God please God let those two be the same thing.

There are so many languages in which the type "string" is not the one to use for strings...

29. coldtea ◴[] No.17522175{3}[source]
>Back then bytes and string were the same thing.

No, they weren't. Impossible as it seems, we had encodings (including multi-byte encodings) for decades before UTF.

Python couldn't use UTF-8 in 1991, but it could very well tag strings with a specific encoding, instead of treating them as a bucket of bytes C-style.

>Java did do it right but by 1995 the industry had already seen the problems of differing character sets.

We had seen the problems of "differing character sets" for decades already (Windows vs DOS version of the same language encodings was a classic example for most users, but the problems go back to EBCDIC and so on).

Java just did a more right thing, but we already have a need for generic string types that can handle multiple encodings and know what they contain and how to convert from one to another.

30. kqr ◴[] No.17522768{3}[source]
Encodings are related to storage and transportation -- not business logic. You should not have to deal with encodings inside your application, so forcing programmers to deal with encodings at the point it matters, when text enters and exits the application, is thus sound.

If yoy truly don't care whether or not the text is decodable (which is sometimes the case), then don't read it as a string, read it as a byte array.

You can still have methods that are generic over byte arrays and text, since that is an orthogonal concern.

31. jessaustin ◴[] No.17522848{4}[source]
For lots of applications it doesn't matter how many bytes are in a character, because characters don't matter. Even within a particular application, it's common for characters to only matter in a few particular locations. It would still have been a win for python, to make such applications easier to write and maintain.
32. AlexCoventry ◴[] No.17528411[source]
> the bad decision was treating bytes and strings interchangeably in the first place

Python preceded unicode, though.