IMO, not a fair benchmark.
I can see the source of an 10x improvement on an Intel(R) Xeon(R) Gold 6136 CPU @ 3.00GHz, but it drops to 3x improvement when I remove the to/from that clones or collects Vecs, and always allocate an 8K Vec instead of a ::Default for the writable buffer.
If anything, the benches should be updated in a tower service / codec generics style where other formats like protobuf do not use any Fory-related code at all.
Note also that Fory has some writer pool that is utilized during the tests:
https://github.com/apache/fory/blob/fd1d53bd0fbbc5e0ce6d53ef...
Original bench selection for Fory:
Benchmarking ecommerce_data/fory_serialize/medium: Collecting 100 samples in estimated 5.0494 s (197k it
ecommerce_data/fory_serialize/medium
time: [25.373 µs 25.605 µs 25.916 µs]
change: [-2.0973% -0.9263% +0.2852%] (p = 0.15 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
Compared to original bench for Protobuf/Prost:
Benchmarking ecommerce_data/protobuf_serialize/medium: Collecting 100 samples in estimated 5.0419 s (20k
ecommerce_data/protobuf_serialize/medium
time: [248.85 µs 251.04 µs 253.86 µs]
Found 18 outliers among 100 measurements (18.00%)
8 (8.00%) high mild
10 (10.00%) high severe
However after allocating 8K instead of ::Default and removing to/from it for an updated protobuf bench:
fair_ecommerce_data/protobuf_serialize/medium
time: [73.114 µs 73.885 µs 74.911 µs]
change: [-1.8410% -0.6702% +0.5190%] (p = 0.30 > 0.05)
No change in performance detected.
Found 14 outliers among 100 measurements (14.00%)
2 (2.00%) high mild
12 (12.00%) high severe