wget https://github.com/beehive-lab/TornadoVM/releases/download/v... unzip tornadovm-2.1.0-opencl-linux-amd64.zip # Replace <path-to-sdk> manually with the absolute path of the extracted folder export TORNADO_SDK="<path-to-sdk>/tornadovm-2.1.0-opencl" export PATH=$TORNADO_SDK/bin:$PATH

tornado --devices tornado --version

# Navigate to the project directory cd GPULlama3.java

# Source the project-specific environment paths -> this will ensure the source set_paths

# Build the project using Maven (skip tests for faster build) # mvn clean package -DskipTests or just make make

# Run the model (make sure you have downloaded the model file first - see below) ./llama-tornado --gpu --verbose-init --opencl --model beehive-llama-3.2-1b-instruct-fp16.gguf --prompt "tell me a joke"

1. mikepapadim ◴[11 Dec 25 15:59 UTC] No.46233015[source]▶

>>46233009 (OP) #

https://github.com/beehive-lab/GPULlama3.java

2. lostmsu ◴[12 Dec 25 01:56 UTC] No.46240000[source]▶

>>46233009 (OP) #

Does it support flash attention? Use tensor cores? Can I write custom kernels?

UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.

replies(1): >>46242072 #

3. mikepapadim ◴[12 Dec 25 08:32 UTC] No.46242072[source]▶

>>46240000 #

Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...

replies(1): >>46242644 #

4. lostmsu ◴[12 Dec 25 10:12 UTC] No.46242644{3}[source]▶

>>46242072 #

TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393

replies(1): >>46243918 #

5. mikepapadim ◴[12 Dec 25 13:27 UTC] No.46243918{4}[source]▶

>>46242644 #

https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313

replies(1): >>46261711 #

6. lostmsu ◴[14 Dec 25 08:48 UTC] No.46261711{5}[source]▶

>>46243918 #

I believe these are SIMD. Tensor cores require MMA family of instructions. Ask me how I know. :)

https://github.com/m4rs-mt/ILGPU/compare/master...lostmsu:IL...

Good article: https://alexarmbr.github.io/2024/08/10/How-To-Write-A-Fast-M...

↑

Show HN: GPULlama3.java Llama Compilied to PTX/OpenCL Now Integrated in Quarkus