Leading a large open source project must be terrible in this age of constant outrage :-(
Leading a large open source project must be terrible in this age of constant outrage :-(
I do understand people's points about "the age of outrage" and "internet 2018" but still: the PEP wasn't generally accepted as being a fantastic improvement, so why did he feel the need to fight so hard for it?
Interestingly, C++ is going through the same process, with lots of great ideas being proposed, but the sum total of them being an even more complicated language (on top of what is probably the most complicated language already).
Python has been successful, IMHO, because Guido has made several brave, controversial calls. Python 3 breakage and async turned out to be prescient, fantastic decisions.
Async maybe. Python 3 breakage? Did you forgot the /s tag?
So yes some breaking change was indicated, but the particular change that was made was the wrong one.
Long and bitter experience has shown that people who think they can "simply ignore" the "issue" of encoding actually can't. That mindset is mostly a more polite way of saying "people who assume everything is ASCII all the time, or at most an encoding that always has one byte == one character". Those assumptions break sooner or later. I prefer having them break sooner, because I've been the person who had to clean up the mess at an unpleasant time when it was kicked to "later".
Which in turn means Python 3 made the right choice: text is text and bytes are bytes, and you should never ever pretend bytes are text no matter how much you think you'll never run into a case where the assumption fails.
Making people actually do the conversion has the advantage that when writing their string conversion code they might actually do something with the exception beyond maybe logging it and then pressing on anyway. It also gives you the opportunity to explicitly offer them alternatives like treating everything we can't encode as some sort of replacement character (works well for Unicode, not so much for ASCII), which is way too much to ask of every single function that takes bytes.
That's the ruby solution. I like the ruby API here. When ruby introduced it in 1.9, it did cause similar upgrade pain, since you weren't used to having your strings tagged with an encoding, and suddenly they all kind of need to be if you want to do anything with them as strings-not-bytes.
As someone else noted would be the result, indeed the result was lots of "incompatible encoding" exceptions.
I think ruby actually has a pretty reasonable API here, but several years on, there are still _plenty_ of developers who don't understand it.
Sorry if I wasn't clear. I meant to suggest that the byte-functions wouldn't know or do anything about encodings. They just work with bytes. It's the other functions, that take the encoding-annotated bytes (or optionally a "pure" unicode type), that would care about encodings.
If yoy truly don't care whether or not the text is decodable (which is sometimes the case), then don't read it as a string, read it as a byte array.
You can still have methods that are generic over byte arrays and text, since that is an orthogonal concern.