Most active commenters

ranger_danger(6)
fsckboy(6)
o11c(3)
burnt-resistor(3)

Popular/hot comments

>>44609974 #

Making a StringBuffer in C, and questioning my sanity

(briandouglas.ie)

1. ranger_danger ◴[18 Jul 25 20:46 UTC] No.44609646[source]▶

>>44569819 (OP) #

You might be interested in https://github.com/antirez/sds

replies(1): >>44610559 #

2. improgrammer007 ◴[18 Jul 25 21:17 UTC] No.44609938[source]▶

>>44569819 (OP) #

I would rather focus on solving the main problem than reinvent the wheel. Just use C++ if perf is critical which gives you all these things for free. In this day and age the reasons for using C as your main language should be almost zero.

3. o11c ◴[18 Jul 25 21:22 UTC] No.44609974[source]▶

>>44569819 (OP) #

Hm, this implementation seems allergic to passing types by value, which eliminates half of the allocations. It also makes the mistake of being mutable-first, and provides some fundamentally-inefficient operations.

The main mistake that this makes in common with most string implementations make is to only provide a single type, rather than a series of mostly-compatible types that can be used generically in common contexts, but which differ in ways that sometimes matter. Ownership, lifetime, representation, etc.

replies(4): >>44610524 #>>44610702 #>>44611230 #>>44611895 #

4. remexre ◴[18 Jul 25 22:34 UTC] No.44610524[source]▶

>>44609974 #

How would you recommend doing that sort of "subtyping"? _Generic and macros?

replies(1): >>44611063 #

5. fsckboy ◴[18 Jul 25 22:38 UTC] No.44610559[source]▶

>>44609646 #

neat, i like it, has some of the same ideas i've used in my string packages

but i did see a place to shave a byte in the sds data struct. The null terminator is a wasted field, that byte (or int) should be used to store the amount of free space left in the buffer (as a proxy for strlen). When there is no space left in the buffer, the free space value will be.... a very convenient 0 heheh

hey, OP said he wants to be a better C programmer!

replies(1): >>44610703 #

6. amelius ◴[18 Jul 25 22:56 UTC] No.44610702[source]▶

>>44609974 #

I wonder how an LLM would rate this code.

7. ranger_danger ◴[18 Jul 25 22:56 UTC] No.44610703{3}[source]▶

>>44610559 #

> The null terminator is a wasted field

I think that would break its "Compatible with normal C string functions" feature.

replies(1): >>44610765 #

8. fsckboy ◴[18 Jul 25 23:05 UTC] No.44610765{4}[source]▶

>>44610703 #

nooooo you don't understand. when the buffer is not full, the string will be zero terminated "in buffer" (which is how it works as is anyway). when the buffer is full, the "free count" at the end will do double duty, both as a zero count and a zero terminater

replies(1): >>44610876 #

9. ranger_danger ◴[18 Jul 25 23:16 UTC] No.44610876{5}[source]▶

>>44610765 #

But calling "normal C string functions" don't know about the "free count" byte, right? So it wouldn't be updated... unless I'm misunderstanding something.

replies(2): >>44611146 #>>44611838 #

10. o11c ◴[18 Jul 25 23:39 UTC] No.44611063{3}[source]▶

>>44610524 #

Yup. It's a lot saner in C++, but people who refuse to use C++ for political reasons can do it the ugly way using C11 or GNU C.

replies(1): >>44611135 #

11. improgrammer007 ◴[18 Jul 25 23:51 UTC] No.44611135{4}[source]▶

>>44611063 #

They even downvote people who suggest C++ :-). Doing this in C is such a colossal waste of time and energy, not to mention the bugs it'll introduce. Sigh!

replies(2): >>44611260 #>>44611802 #

12. fsckboy ◴[18 Jul 25 23:53 UTC] No.44611146{6}[source]▶

>>44610876 #

normal c string functions don't know about any of this package's improvements, I'm not sure you understand what the package does.

    +--------+-------------------------------+-----------+
    | Header | Binary safe C alike string... | Null term |
    +--------+-------------------------------+-----------+
             |
             `-> Pointer returned to the user.

his trick is to create a struct with fields in the header for extra information about the string, and then a string buffer also in the struct. but on instantiation, instead of returning the address of the struct/header, he returns the address of the string, so it could be passed to strlen and return the right answer, or open and open the right file, all compatible-like.

but if you call "methods" on the package, they know that there is a header with struct fields below the string buffer and it can obtain those, and update them if need be.

He doesn't document that in more detail in the initial part of the spec/readme, but an obvious thing to add in the header would be a strlen, so you'd know where to append without counting through the string. But without doing something like that, there is no reason to have a header. Normal string functions can "handle" these strings, but they can't update the header information. I'm just extending that concept to the byte at the end also.

this type of thing falls into what the soulless ginger freaks call UB and want to eliminate.

(soulless ginger freaks? a combination of "rust colored" and https://www.youtube.com/watch?v=EY39fkmqKBM )

replies(1): >>44611261 #

13. zahlman ◴[19 Jul 25 00:05 UTC] No.44611230[source]▶

>>44609974 #

> It also makes the mistake of being mutable-first

Is mutability not part of the point of having a string buffer? Wouldn't the corresponding immutable type just be a string?

replies(1): >>44611638 #

14. zahlman ◴[19 Jul 25 00:10 UTC] No.44611260{5}[source]▶

>>44611135 #

Trolling about the choice of implementation language from a throwaway account is worth downvotes, yes. Doing a given task in a given language, simply for the sake of having it done in that language, is a legitimate endeavour, and having someone document (from personal experience) why it's difficult in that language is real content worth discussion. Choosing a better language is very much not a goal here.

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

15. ranger_danger ◴[19 Jul 25 00:10 UTC] No.44611261{7}[source]▶

>>44611146 #

> instead of returning the address of the struct

Yes I'm pretty sure I understand this part.

> an obvious thing to add in the header would be a strlen

The length is already in the header from what I can tell: https://github.com/antirez/sds/blob/master/sds.h#L64

But my point was that if something like your "free count" byte existed at the end, I would think it couldn't be relied upon because functions such as s*printf that might truncate, don't know about that field, and you don't want later "methods" to rely on a field that hasn't been updated and then run off the end.

And from what I can tell from the link above, there isn't actually a "free count" defined anywhere in the struct, the buffer appears to be at the end of the struct, with no extra fields after it.

Maybe I'm misunderstanding something?

replies(1): >>44611641 #

16. WalterBright ◴[19 Jul 25 00:25 UTC] No.44611357[source]▶

>>44569819 (OP) #

    new_capacity *= 2;

A better value is to increase size by 1.5:

https://stackoverflow.com/questions/1100311/what-is-the-idea...

replies(2): >>44611645 #>>44612459 #

17. gblargg ◴[19 Jul 25 00:26 UTC] No.44611359[source]▶

>>44569819 (OP) #

It's odd how it has error reporting in some areas (alloc, split can return NULL if allocation fails), but not others (append, prepend have a void return type but might require allocation internally).

18. o11c ◴[19 Jul 25 01:13 UTC] No.44611638{3}[source]▶

>>44611230 #

"Buffer" just means it is used between input and output. It does not imply mutability, and many buffers indeed only take their state at construction time and are not mutable.

In my experience, the only functions a mutable string buffer needs to provide are "append string (or to-string-able)" and "undo that append" (which mostly comes up in list-like contexts, e.g. to remove a final comma); for everything else you can convert to an immutable string first.

(theoretically there might be a "split and clobber" function like `strtok`, but in my experience it isn't that useful once your APIs actually take a buffer class).

Considering the functions from this implementation, they can be divided as follows:

Lifetime methods:

  init
  free
  clear

Immutable methods:

  print
  index_of
  match_all
  split

Mutable methods:

  append
  prepend (inefficient!)
  remove
  replace

I've already mentioned `append`, and I suppose I can grant `prepend` for symmetry (though note that immutable strings do provide some sort of `concatenate`, though beware efficiency concerns). Immutable strings ubiquitously provide `replace` (and `remove` is just `replace` with an empty string), which are much safer/easier to use.

There are also a lot of common operations not provided here. And the ones that are provided fail to work with `StringBuffer` input.

19. fsckboy ◴[19 Jul 25 01:14 UTC] No.44611641{8}[source]▶

>>44611261 #

you misunderstood what i said about the strlen field, but we agree, yes, it's in the header where it belongs.

I explained how returning the address of the string buffer instead of the address of the struct would give you a C compatible string that you could pass to other C library functions. If those functions are "readonly" wrt the string, everything is copasetic.

if those string functions update/write the c-string (which is in the buffer) the strlen in the header will now be wrong. That has nothing to do with my suggestion, and it's already "broken" in that way you point out. My "string free bytes field" suggestion will also be broken by an operation like that, so my suggestion does not make this data structure worse than it already is wrt compatibility with C library functions.

However that strlen and free bytes problem can be managed (no worse than C standard strings themselves) and strlen and/or free bytes are useful features that make some other things easier so overall it's a win.

replies(1): >>44612140 #

20. emmelaich ◴[19 Jul 25 01:15 UTC] No.44611645[source]▶

>>44611357 #

I remember reading (decades ago) an extensive article in Software Practice and Experience reaching the same conclusion.

replies(1): >>44611788 #

21. manwe150 ◴[19 Jul 25 01:44 UTC] No.44611788{3}[source]▶

>>44611645 #

Or like Python shows there, 1.25+k which can be better (faster growth and less memory wasted) than both

replies(1): >>44612494 #

22. vlovich123 ◴[19 Jul 25 01:46 UTC] No.44611802{5}[source]▶

>>44611135 #

Following that argument, c++ is also a colossal waste of time and energy and bugs when compared with Rust :D.

23. dernett ◴[19 Jul 25 01:51 UTC] No.44611838{6}[source]▶

>>44610876 #

I'm assuming he's talking about this specific small string optimization: https://www.youtube.com/watch?v=kPR8h4-qZdk&t=409s

replies(1): >>44612173 #

24. ◴[19 Jul 25 02:01 UTC] No.44611895[source]▶

>>44609974 #

25. ranger_danger ◴[19 Jul 25 02:52 UTC] No.44612140{9}[source]▶

>>44611641 #

I was basing my response off of this:

> i did see a place to shave a byte in the sds data struct. The null terminator is a wasted field

I'm still not sure what byte in the struct you're talking about removing... because I don't see an actual null terminator field.

replies(1): >>44612527 #

26. fsckboy ◴[19 Jul 25 02:59 UTC] No.44612173{7}[source]▶

>>44611838 #

just watched, yes, that is the same optimization

27. burnt-resistor ◴[19 Jul 25 03:55 UTC] No.44612458[source]▶

>>44569819 (OP) #

It can be refactored into creating a buffer primitive of void* buf, size_t capacity, size_t refcount. Then, the string can implement using CoW logic on a buffer and size_t length. Read-only references to substrings become cheap and copying is done whenever there's a modification or realloc can't grow the underlying buffer.

28. burnt-resistor ◴[19 Jul 25 03:56 UTC] No.44612459[source]▶

>>44611357 #

Yep. And probably use tcmalloc or jemalloc (deprecated?) too. Most OS sbrk/libc malloc implementations are better than they used to be, but certain profiled programs can increased performance by tuning one of the nonstandard allocators. YMMV. Test, profile, and experiment.

29. burnt-resistor ◴[19 Jul 25 04:04 UTC] No.44612494{4}[source]▶

>>44611788 #

1.25 of what? Do you mean 2.25*k == 9*k/4.

30. fsckboy ◴[19 Jul 25 04:11 UTC] No.44612527{10}[source]▶

>>44612140 #

the word "null term" appears in the ascii art diagram, that's where the null terminator is. the strlen field is in the portion labelled header.

the strlen field can be moved to where the word "null term" appears, except with a changed semantic of "bytes remaining" so it will go to zero at the right time. now you have a single entity "bytes remaining" instead of two entities, "strlen" and "null" giving a small storage saving (there is an additional null terminator most of the time, right at the end of the string; but this doesn't take up any storage because that storage is not used for anything else)

over and out.

replies(1): >>44612579 #

31. ranger_danger ◴[19 Jul 25 04:24 UTC] No.44612579{11}[source]▶

>>44612527 #

> the word "null term" appears in the ascii art diagram

Yes but it does not appear anywhere in the struct that I can see... I would love to be proven wrong though.

↑