> To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this.
Is this true? Is sscanf really O(N) on the size of the string? Why does it need to call strlen in the first place?
> To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this.
Is this true? Is sscanf really O(N) on the size of the string? Why does it need to call strlen in the first place?
The MUSL C library' sscanf() does not do this, but does call memchr() on limited substrings of the input string as it refills its input buffer, so it's not entirely free of this behaviour.
* https://git.musl-libc.org/cgit/musl/tree/src/stdio/vsscanf.c
The sscanf() in Microsoft's C library does this because it all passes through a __stdio_common_vsscanf() function which uses length-counted rather than NUL-terminated strings internally.
* https://github.com/tpn/winsdk-10/blob/master/Include/10.0.16...
* https://github.com/huangqinjin/ucrt/blob/master/inc/corecrt_...
The GNU C library does something similar, using a FILE structure alongside a special "operations" table, with a _rawmemchr() in the initialization.
* https://github.com/bminor/glibc/blob/master/libio/strops.c#L...
* https://github.com/bminor/glibc/blob/master/libio/strfile.h#...
The FreeBSD C library does not use a separate "operations" table.
* https://github.com/freebsd/freebsd-src/blob/main/lib/libc/st...
A glib summary is that sscanf() in these implementations has to set up state on every call that fscanf() has the luxury of keeping around over multiple calls in the FILE structure. They're setting up special nonce FILE objects for each sscanf() call, and that involves finding out how long the input string is every time.
It is food for thought. How much could life be improved if these implementations exported the way to set up these nonce FILE structures from a string, and callers used fscanf() instead of sscanf()? How many applications are scanning long strings with lots of calls to sscanf()?
Neither P.J. Plauger's nor my Standard C library (which I wrote in the 1990s and used for my 32-bit OS/2 programs) work this way. We both use simple callback functions that use "void*"s that are opaque to the common internals of *scanf() but that are cast to "FILE*" or "const char*" in the various callback functions.
OpenWatcom's C library does the same. Things don't get marshalled into nonce FILE objects on every call. Rather, the callback functions simply look at the next character to see whether it is NUL. They aren't even using memchr() calls to find a NUL in the first position of a string. (-:
* http://perforce.openwatcom.org:4000/@md=d&cd=//depot/V2/src/...
* https://groups.google.com/g/comp.lang.c/c/SPOnRZ3nEHk/m/dAoB...