> Over the decades I have lost data to tools skipping non-UTF-8 filenames. I should not be blamed for having files that were named before UTF-8 existed.
Umm.. why blame Go for that?
> Over the decades I have lost data to tools skipping non-UTF-8 filenames. I should not be blamed for having files that were named before UTF-8 existed.
Umm.. why blame Go for that?
What I intended to say with this is that ignoring the problem if invalid UTF-8 (could be valid iso8859-1) with no error handling, or other way around, has lost me data in the past.
Compare this to Rust, where a path name is of a different type than a mere string. And if you need to treat it like a string and you don't care if it's "a bit wrong" (because it's for being shown to the user), then you can call `.to_string_lossy()`. But it's be more hard to accidentally not handle that case when exact name match does matter.
When exactness matters, `.to_str()` returns `Option<&str>`, so the caller is forced to deal with the situation that the file name may not be UTF-8.
Being sloppy with file name encodings is how data is lost. Go is sloppy with strings of all kinds, file names included.
But forcing all strings to be UTF-8 does not magically help with the issue you described. In practice I've often seen the opposite: Now you have to write two code paths, one for UTF-8 and one for everything else. And the second one is ignored in practice because it is annoying to write. For example, I built the web server project in your other submission (very cool!) and gave it a tar file that has a non-UTF-8 name. There is no special handling happening, I simply get "error: invalid UTF-8 was detected in one or more arguments" and the application exits. It just refuses to work with non-UTF-8 files at all -- is this less sloppy?
Forcing UTF-8 does not "fix" compatibility in strange edge cases, it just breaks them all. The best approach is to treat data as opaque bytes unless there is a good reason not to. Which is what Go does, so I think it is unfair to blame Go for this particular reason instead of the backup applications.
You can debate whether it is sloppy but I think an error is much better than silently corrupting data.
> The best approach is to treat data as opaque bytes unless there is a good reason not to
This doesn't seem like a good approach when dealing with strings which are not just blobs of bytes. They have an encoding and generally you want ways to, for instance, convert a string to upper/lowercase.