←back to thread

Why is Windows so slow?

(games.greggman.com)
337 points kristianp | 2 comments | | HN request time: 0.427s | source
Show context
evmar ◴[] No.3368867[source]
I don't know this poster, but I am pretty familiar with the problem he's encountering, as I am the person most responsible for the Chrome build for Linux.

I (and others) have put a lot of effort into making the Linux Chrome build fast. Some examples are multiple new implementations of the build system ( http://neugierig.org/software/chromium/notes/2011/02/ninja.h... ), experimentation with the gold linker (e.g. measuring and adjusting the still off-by-default thread flags https://groups.google.com/a/chromium.org/group/chromium-dev/... ) as well as digging into bugs in it, and other underdocumented things like 'thin' ar archives.

But it's also true that people who are more of Windows wizards than I am a Linux apprentice have worked on Chrome's Windows build. If you asked me the original question, I'd say the underlying problem is that on Windows all you have is what Microsoft gives you and you can't typically do better than that. For example, migrating the Chrome build off of Visual Studio would be a large undertaking, large enough that it's rarely considered. (Another way of phrasing this is it's the IDE problem: you get all of the IDE or you get nothing.)

When addressing the poor Windows performance people first bought SSDs, something that never even occurred to me ("your system has enough RAM that the kernel cache of the file system should be in memory anyway!"). But for whatever reason on the Linux side some Googlers saw it fit to rewrite the Linux linker to make it twice as fast (this effort predated Chrome), and all Linux developers now get to benefit from that. Perhaps the difference is that when people write awesome tools for Windows or Mac they try to sell them rather than give them away.

Including new versions of Visual Studio, for that matter. I know that Chrome (and Firefox) use older versions of the Visual Studio suite (for technical reasons I don't quite understand, though I know people on the Chrome side have talked with Microsoft about the problems we've had with newer versions), and perhaps newer versions are better in some of these metrics.

But with all of that said, as best as I can tell Windows really is just really slow for file system operations, which especially kills file-system-heavy operations like recursive directory listings and git, even when you turn off all the AV crap. I don't know why; every time I look deeply into Windows I get more afraid ( http://neugierig.org/software/chromium/notes/2011/08/windows... ).

replies(10): >>3368892 #>>3368926 #>>3369043 #>>3369059 #>>3369102 #>>3369181 #>>3369566 #>>3369907 #>>3370579 #>>3372438 #
1. malkia ◴[] No.3372438[source]
On Windows, SysInternals's RamMap is your friend. Also the System Internals book (#5 I think).

Every file on Windows keeps ~1024 bytes of info in the file cache. The more the files the more cache would be used.

Recent finding, that sped up our systems from 15->3sec on 300,000+ files filestamp check was to move from _stat to GetFileAttributesEx.

One would not think of doing such things, after all the C api is nice, open, access, bread, _stat are almost all there, but some of these functions do a lot of CPU intensive work (and one is not aware, until just a little bit of disassembly is done).

For example _stat does lots of divisions, dozen of kernel calls, strcmp, strchr, and few other things. If you have Visual Studio (or even Express) the CRT source code is included for one to see.

access() for example is relatively "fast", in terms that there is just mutex (I think, I'm on OSX right now), and then calling GetFileAttributes.

And back to RamMap - it's very useful in the sense that it shows you which files are in the cache, and what portion of them, also very useful that it can flush the cache, so one can run several tests, and optimize for hot-cache and cold-cache situations.

Few months ago, I came up with a scheme, borrowing idea from mozilla - where they would just pre-read certain DLLs that would eventually came up to be loaded (in a second thread).

I did the same for one our tools, the tool is single threaded, reads, operates, then writes. And it usually reads 10x more than it writes. So while the process operation was able to get multi-threaded through OpenMP, reading was not, so instead I had a list of what to read ahead, in a second thread, so that when it comes to the first thread, and it wanted to read, it was taking it from the cache. If the pre-fetcher was behind, it was skipping ahead. There was not even need to keep the contents in memory, just enough to read, and that's it.

For some other tool, where reading patterns cannot be found easily (deep-tree hierarchy) I've made something else instead - saving in a binary file what was read before for the given set of command-line arguments (filtering some). Later that was reused. It cut down on certain operations 25-50%.

One lesson I've learned though is to let the I/O do it's job from one thread... Unless everyone has some proper OS, with some proper drivers, with some proper HW, with....

Tim Bradshaws' widefinder, and widefinder2 competition had also good information. The guy that win it, has on his site some good analysis of multi-threaded I/O (can't find the site now),... But the analysis was basically that it's too erratic - sometimes you get speedups, but sometimes you get slowdowns (especially with writes).

replies(1): >>3382939 #
2. evmar ◴[] No.3382939[source]
Thanks for the GetFileAttributesEx tip -- https://github.com/martine/ninja/commit/93c78361e30e33f950ee...