←back to thread

156 points cpldcpu | 1 comments | | HN request time: 0.201s | source
Show context
malwrar ◴[] No.41892377[source]
Super interesting!

I wish tfa would have found some way to measure the PMS150C implementation the headline brags about, but even the PFS154 (2x mem, 3x price) version is super neat! Interesting to see how the net in particular is built at such small scale. I also wish they included numbers about performance like they do in their linked CH32V003 post. I'm wondering how quick these MCUs are compared to each other and e.g. OP's PC, and how hot they get under sustained load.

replies(1): >>41893406 #
cpldcpu ◴[] No.41893406[source]
There are no performance profiling mechanisms on these small devices, and the timers are rather coarse.

But it is easily possible to estimate the execute time:

- mulacc of one weight takes 11 clock cycles.

- There are 1696 weights in the model, each one is only touched once.

- We can assume ~25%-50% overhead for loops and housekeeping (1:4 unrolled)

=> ~23000-28000 clock cycles per inference, which is less than 2ms at 16MHz

Since this is an MLP, the inference time directly scales with the number of weights. (This would be different for a CNN)

As per veryfing on PMC150C - I considered using an LED for valid/nonvalid output. But iterating with OTP devices is quite tedious when you do not have an emulator. Since both devices are code compatible, we can assume that the code works on the smaller devices, though.

replies(1): >>41895402 #
1. wongarsu ◴[] No.41895402[source]
If flipping one of the output pins is fast enough you could use that in combination with an oscilloscope as a coarse but very accurate profiling method.

Though I believe for most people "roughly 2ms" is good enough