Ask HN: What are the nice feature you need in a programming language?

1. ActorNightly ◴[21 Nov 24 04:08 UTC] No.42201001[source]▶

>>42200056 (OP) #

Im going to save you time and describe what the optimal programming language anyone actually wants, no matter what they say:

People want to be able to write either python or javascript (i.e the 2 most widely used languages) , and have a compiler with an language model (doesn't have to be large) on the back end that spits out the optimal assembly code, or IR code for LLVM.

Its already possible to do this with the LLMs direct from the source code, (although converting to C usually yields better results than direct to assembly) but these models are overkill and slow for real compilation work. The actual compiler just need to have a specifically trained model that reads in bytecode (or output of the lexer) and does the conversion, which should be much smaller in size due to having a way smaller token space.

Not only do you get super easy adoption with not having to learn a new language, you also get the advantage of all the libraries in pypi/npm that exist that can be easily converted to optimal native code.

If you manage to get this working, and make it modular, the widespread use of it will inevitably result in community copying this for other languages. Then you can just write in any language you want, and have it all be fast in the end.

And, with transfer learning, the compiler will only get better. For example, it will start to recognize things like parallel processing stuff that it can offload to the GPU or use AVX instructions. It can also automatically make things memory safe without the user having to manually specify it.

replies(3): >>42201027 #>>42201547 #>>42205824 #

2. Hashex129542 ◴[21 Nov 24 04:11 UTC] No.42201027[source]▶

>>42201001 (TP) #

Yes it definitely saves lot of time & effort. Great :)

3. blharr ◴[21 Nov 24 06:01 UTC] No.42201547[source]▶

>>42201001 (TP) #

Confused by this approach, people want to write in interpreted languages and have it compiled with an LLM?

How would you do things like dynamic code execution or reflection? Lots of properties are stripped as part of the compilation that you wouldn't be able to refer back to.

Are you just saying write python -> interpret it -> compile it -> convert to assembly? Because I believe that already exists, but is difficult to just do that all the time because of the compile step and having to convert to static typing

replies(2): >>42202350 #>>42208500 #

4. rerdavies ◴[21 Nov 24 08:48 UTC] No.42202350[source]▶

>>42201547 #

The same way c# used to do it. C# provided dynamic code generation in both byte-code-level, and AST/lamba implementations. And even provided an interactive C# "interpreter" that actually used dynamic code generation under the covers. All of which died with .net core. I rather suspected that Microsoft decided that dynamic code generation was far too useful for writing cloaked viruses, and not quite generally useful enough to justify the effort.

You'd have to generate reflection data at compile time. And llvm supports dynamic code generation, so that's not a problem either.

Not really sure why anyone would want to do an interpreted language though.

replies(1): >>42203833 #

5. neonsunset ◴[21 Nov 24 12:49 UTC] No.42203833{3}[source]▶

>>42202350 #

Expression Trees and IQueryable<T> compilation did not die and remain fully supported features. For example EF Core uses them for query compilation. 'dynamic' did not die either even though it should not be used because there are usually better constructs for this.

6. duped ◴[21 Nov 24 16:17 UTC] No.42205824[source]▶

>>42201001 (TP) #

I find it dubious that an ML model would outperform existing compilers (AOT or JIT) for Python and JS, both of which exist and have many engineer years invested in their design and testing.

I find it even more dubious that someone would want something that could hallucinate generating machine code. The difficulty of optimizing compiler passes is not in writing code that appears to be "better" or "faster" but guaranteeing that it is correct in all possible contexts.

replies(1): >>42208466 #

7. ActorNightly ◴[21 Nov 24 20:44 UTC] No.42208466[source]▶

>>42205824 #

This would be a much different training task than LLMs. The reference to it being possible with large LLMs is just a proof that it can be done.

The reason its different is because you are working with a finite set of token sequences, and you will be training the model on every value of that set, because its fairly small. So hallucination won't be a problem.

Even without ML, its a lengthy but P hard task to really build a python to C translator. Once you unroll things like classes, list comprehensions, generators, e.t.c, you end up with basically the same rough structure of code minus memory allocation. And for the latter, its a process of semantic analysis to figure out how to allocate memory, very deterministic. Then you have your C compiler code as it exists. Put the two together, and you basically have a much faster python without any dynamic memory handling.

The advantage of doing it through ML is that once you do the initial setup of the training set, and set up the pipeline to train the compiler, to integrate any pattern recognition into the compiler would be very trivial.

replies(1): >>42208686 #

8. ActorNightly ◴[21 Nov 24 20:47 UTC] No.42208500[source]▶

>>42201547 #

>dynamic code execution

Run the code through actual Python or NodeJs. Once you are happy with result, compile it to native.

>reflection.

Reflection can be "unrolled" to static values during compilation.

>Are you just saying write python -> interpret it -> compile it -> convert to assembly? Because I believe that already exists,

It exists in the sense that you still have all the python interpreter code for dynamic typing baked into in the executable. This would remove all of this.

9. duped ◴[21 Nov 24 21:10 UTC] No.42208686{3}[source]▶

>>42208466 #

> Once you unroll things like classes, list comprehensions, generators, e.t.c, you end up with basically the same rough structure of code minus memory allocation.

No, you don't, and that's why there are many engineer years invested into designing AoT and JIT compilers for JS and Python.

If you write C like Python you get Python but slower.

> The advantage of doing it through ML is that once you do the initial setup of the training set, and set up the pipeline to train the compiler, to integrate any pattern recognition into the compiler would be very trivial.

Except this has already been done, so what advantage does ML bring? Other than doing it again, but worse, and possibly incorrectly?

replies(1): >>42240992 #

10. ActorNightly ◴[25 Nov 24 23:01 UTC] No.42240992{4}[source]▶

>>42208686 #

AoT/JIT work would literally be the training tool to train the model, with additional optimizations along the way. The issue is that manual removal of all the runtime stuff from the generated code is just cumbersome right now, but with ML, its just a matter of having enough examples.

The advantage is that you get native optimized code like someone wrote it in C directly, and ability to automatically generate code to be offloaded to GPU as people start doing expanded training with higher level pattern recognition.

The incorectness part I adressed already, stochastic output doesnt matter when your domain is finite.