←back to thread

170 points judicious | 1 comments | | HN request time: 0.2s | source
Show context
zackmorris ◴[] No.45407721[source]
I need something like this for a switch() command (technically a list of arbitrary functions). Sort of like up to N branches in one step.

The idea is to take some number of inputs A, B, C, ... and conceptually perform all of the possible functions simultaneously, then keep the one that's desired and throw the rest away. For any arbitrary logic. Ideally using fewer operations than all of that, but that's optional. Driven by one variable, it would look like:

  // branch version
  switch(var1)
  {
    case 1:
      var4 = var2 + var3;
      break;
    case 2:
      var4 = var2 - var3;
      break;
    case 3:
      var4 = var2 * var3;
      break;
    case 4:
      var4 = var2 / var3;
      break;
    // ...
    default:
      var4 = 0; // optional and arbitrary
      break;
  }
  
  // branchless version
  var4 = magic(var1, var2, var3);
I don't know how to do this outside of programmable hardware like an FPGA. The problem is that it's extremely challenging to write/solve the formulas that map ordinary arithmetic and bitwise functions to the larger set of functions.

So for now, I may just use a switch() command in a shader and let it figure out any reusable logic internally. I don't know if shaders have come far enough to allow that performantly, or if they end up just calculating everything and throwing away the paths not taken. Which would suggest that the max speed would be O(1/N) for the number of cases.

Does anyone know? Maybe truth tables? Or a SAT solver? I'm also not sure how this would work for floating point, but that's optional too.

Edit: I updated the cases to show how var1 controls the math performed on var2, var3 and var4.

replies(6): >>45407851 #>>45407994 #>>45408378 #>>45408469 #>>45411069 #>>45463140 #
1. zackmorris ◴[] No.45463140[source]
Thank you everyone for your answers, they got me thinking in new ways.

In case anyone ever finds this, I realized that a brute force solution would be a lookup table of all combinations. For three 8 bit variables, that's 2^24=16,777,216 or 16 MB of EPROM.

Then we could apply truth table simplification techniques like Karnaugh maps, sum of products (SOP), products of sums (POS), etc:

https://workforce.libretexts.org/Bookshelves/Electronics_Tec...

https://tma.main.jp/logic/index_en.html

https://karnaughmapsolver.com/

I asked AI, and if the bits approximate a pseudorandom pattern, then we would need approximately log2(2^24)=24 levels of 2-input boolean logic gates. For NAND or NOR gates, apparently that's something like 3x2^27=400 million gates. If we could manage 24-input gates somehow, that might be reduced to ceil(log2(24))=5 levels, but internally that would occupy about the same amount of die area. Perhaps using a hybrid of gates, multiplexers and EPROMS could result in a much smaller circuit at the cost of higher propagation delay (latency), which is the goal of FPGAs.

We could then play with the ordering of the control variable var1 to explore all 256 possibilities and see if any result in a reduced number of logic levels (by maximizing the size of Karnaugh map loops). I suspect that might only save a few levels at most though, but even that could cut the number of gates by 50-90%.

Thankfully the control variable var1 might only have 16-64 possibilities, which lowers it to 4-6 bits or just 2^20=1,048,576 to 2^22=4,194,304 or 1-4 MB of EPROM, but still about 3*2^25=100 million gates worst case.

For 16, 32 or 64 bit integers or floats, it might be better to just calculate all cases with dedicated hardware and select the right one.