←back to thread

Go is still not good

(blog.habets.se)
644 points ustad | 1 comments | | HN request time: 0s | source
Show context
0x000xca0xfe ◴[] No.44983984[source]
> If you stuff random binary data into a string, Go just steams along, as described in this post.

> Over the decades I have lost data to tools skipping non-UTF-8 filenames. I should not be blamed for having files that were named before UTF-8 existed.

Umm.. why blame Go for that?

replies(1): >>44984257 #
thomashabets2 ◴[] No.44984257[source]
Author here.

What I intended to say with this is that ignoring the problem if invalid UTF-8 (could be valid iso8859-1) with no error handling, or other way around, has lost me data in the past.

Compare this to Rust, where a path name is of a different type than a mere string. And if you need to treat it like a string and you don't care if it's "a bit wrong" (because it's for being shown to the user), then you can call `.to_string_lossy()`. But it's be more hard to accidentally not handle that case when exact name match does matter.

When exactness matters, `.to_str()` returns `Option<&str>`, so the caller is forced to deal with the situation that the file name may not be UTF-8.

Being sloppy with file name encodings is how data is lost. Go is sloppy with strings of all kinds, file names included.

replies(1): >>44985671 #
0x000xca0xfe ◴[] No.44985671[source]
Thanks for your reply. I understand that encoding the character set in the type system is more explicit and can help find bugs.

But forcing all strings to be UTF-8 does not magically help with the issue you described. In practice I've often seen the opposite: Now you have to write two code paths, one for UTF-8 and one for everything else. And the second one is ignored in practice because it is annoying to write. For example, I built the web server project in your other submission (very cool!) and gave it a tar file that has a non-UTF-8 name. There is no special handling happening, I simply get "error: invalid UTF-8 was detected in one or more arguments" and the application exits. It just refuses to work with non-UTF-8 files at all -- is this less sloppy?

Forcing UTF-8 does not "fix" compatibility in strange edge cases, it just breaks them all. The best approach is to treat data as opaque bytes unless there is a good reason not to. Which is what Go does, so I think it is unfair to blame Go for this particular reason instead of the backup applications.

replies(2): >>44986118 #>>44986549 #
1. thinkharderdev ◴[] No.44986549[source]
> It just refuses to work with non-UTF-8 files at all -- is this less sloppy?

You can debate whether it is sloppy but I think an error is much better than silently corrupting data.

> The best approach is to treat data as opaque bytes unless there is a good reason not to

This doesn't seem like a good approach when dealing with strings which are not just blobs of bytes. They have an encoding and generally you want ways to, for instance, convert a string to upper/lowercase.