←back to thread

204 points JPLeRouzic | 2 comments | | HN request time: 0s | source

I polished a Markov chain generator and trained it on an article by Uri Alon and al (https://pmc.ncbi.nlm.nih.gov/articles/PMC7963340/).

It generates text that seems to me at least on par with tiny LLMs, such as demonstrated by NanoGPT. Here is an example:

  jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$
  ./SLM10b_train UriAlon.txt 3
  
  Training model with order 3...
  
  Skip-gram detection: DISABLED (order < 5)
  
  Pruning is disabled
  
  Calculating model size for JSON export...
  
  Will export 29832 model entries
  
  Exporting vocabulary (1727 entries)...
  
  Vocabulary export complete.
  
  Exporting model entries...
  
    Processed 12000 contexts, written 28765 entries (96.4%)...
  
  JSON export complete: 29832 entries written to model.json
  
  Model trained and saved to model.json
  
  Vocabulary size: 1727
  
  jplr@mypass:~/Documenti/2025/SimpleModels/v3_very_good$ ./SLM9_gen model.json
Aging cell model requires comprehensive incidence data. To obtain such a large medical database of the joints are risk factors. Therefore, the theory might be extended to describe the evolution of atherosclerosis and metabolic syndrome. For example, late‐stage type 2 diabetes is associated with collapse of beta‐cell function. This collapse has two parameters: the fraction of the senescent cells are predicted to affect disease threshold . For each individual, one simulates senescent‐cell abundance using the SR model has an approximately exponential incidence curve with a decline at old ages In this section, we simulated a wide range of age‐related incidence curves. The next sections provide examples of classes of diseases, which show improvement upon senolytic treatment tends to qualitatively support such a prediction. model different disease thresholds as values of the disease occurs when a physiological parameter ϕ increases due to the disease. Increasing susceptibility parameter s, which varies about 3‐fold between BMI below 25 (male) and 54 (female) are at least mildly age‐related and 25 (male) and 28 (female) are strongly age‐related, as defined above. Of these, we find that 66 are well described by the model as a wide range of feedback mechanisms that can provide homeostasis to a half‐life of days in young mice, but their removal rate slows down in old mice to a given type of cancer have strong risk factors should increase the removal rates of the joint that bears the most common biological process of aging that governs the onset of pathology in the records of at least 104 people, totaling 877 disease category codes (See SI section 9), increasing the range of 6–8% per year. The two‐parameter model describes well the strongly age‐related ICD9 codes: 90% of the codes show R 2 > 0.9) (Figure 4c). This agreement is similar to that of the previously proposed IMII model for cancer, major fibrotic diseases, and hundreds of other age‐related disease states obtained from 10−4 to lower cancer incidence. A better fit is achieved when allowing to exceed its threshold mechanism for classes of disease, providing putative etiologies for diseases with unknown origin, such as bone marrow and skin. Thus, the sudden collapse of the alveoli at the outer parts of the immune removal capacity of cancer. For example, NK cells remove senescent cells also to other forms of age‐related damage and decline contribute (De Bourcy et al., 2017). There may be described as a first‐passage‐time problem, asking when mutated, impair particle removal by the bronchi and increase damage to alveolar cells (Yang et al., 2019; Xu et al., 2018), and immune therapy that causes T cells to target senescent cells (Amor et al., 2020). Since these treatments are predicted to have an exponential incidence curve that slows at very old ages. Interestingly, the main effects are opposite to the case of cancer growth rate to removal rate We next consider the case of frontline tissues discussed above.
Show context
Sohcahtoa82 ◴[] No.45995897[source]
A Markov Chain trained by only a single article of text will very likely just regurgitate entire sentences straight from the source material. There just isn't enough variation in sentences.

But then, Markov Chains fall apart when the source material is very large. Try training a chain based on Wikipedia. You'll find that the resulting output becomes incoherent garbage. Increasing the context length may increase coherence, but at the cost of turning into just simple regurgitation.

In addition to the "attention" mechanism that another commenter mentioned, it's important to note that Markov Chains are discrete in their next token prediction while an LLM is more fuzzy. LLMs have latent space where the meaning of a word basically exists as a vector. LLMs will generate token sequences that didn't exist in the source material, whereas Markov Chains will ONLY generate sequences that existed in the source.

This is why it's impossible to create a digital assistant, or really anything useful, via Markov Chain. The fact that they only generate sequences that existed in the source mean that it will never come up with anything creative.

replies(12): >>45995946 #>>45996109 #>>45996662 #>>45996887 #>>45996937 #>>45998252 #>>45999650 #>>46000705 #>>46002052 #>>46002754 #>>46004144 #>>46021459 #
johnisgood ◴[] No.45995946[source]
> The fact that they only generate sequences that existed in the source mean that it will never come up with anything creative.

I have seen the argument that LLMs can only give you what its been trained on, i.e. it will not be "creative" or "revolutionary", that it will not output anything "new", but "only what is in its corpus".

I am quite confused right now. Could you please help me with this?

Somewhat related: I like the work of David Hume, and he explains it quite well how we can imagine various creatures, say, a pig with a dragon head, even if we have not seen one ANYWHERE. It is because we can take multiple ideas and combine them together. We know how dragons typically look like, and we know how a pig looks like, and so, we can imagine (through our creativity and combination of these two ideas) how a pig with a dragon head would look like. I wonder how this applies to LLMs, if they even apply.

Edit: to clarify further as to what I want to know: people have been telling me that LLMs cannot solve problems that is not in their training data already. Is this really true or not?

replies(16): >>45996256 #>>45996266 #>>45996274 #>>45996313 #>>45996484 #>>45996757 #>>45997088 #>>45997100 #>>45997291 #>>45997366 #>>45999327 #>>45999540 #>>46001856 #>>46001954 #>>46007347 #>>46017836 #
thaumasiotes ◴[] No.45996266[source]
>> The fact that they only generate sequences that existed in the source

> I am quite confused right now. Could you please help me with this?

This is pretty straightforward. Sohcahtoa82 doesn't know what he's saying.

replies(1): >>45996332 #
Sohcahtoa82 ◴[] No.45996332[source]
I'm fully open to being corrected. Just telling me I'm wrong without elaborating does absolutely nothing to foster understanding and learning.
replies(1): >>45996354 #
thaumasiotes ◴[] No.45996354[source]
If you still think there's something left to explain, I recommend you read your other responses. Being restricted to the training data is not a property of Markov output. You'd have to be very, very badly confused to think that it was. (And it should be noted that a Markov chain itself doesn't contain any training data, as is also true of an LLM.)

More generally, since an LLM is a Markov chain, it doesn't make sense to try to answer the question "what's the difference between an LLM and a Markov chain?" Here, the question is "what's the difference between a tiny LLM and a Markov chain?", and assuming "tiny" refers to window size, and the Markov chain has a similarly tiny window size, they are the same thing.

replies(3): >>45996464 #>>45996469 #>>45999574 #
johnisgood ◴[] No.45996464[source]
He said LLMs are creative, yet people have been telling me that LLMs cannot solve problems that is not in their training data. I want this to be clarified or elaborated on.
replies(1): >>45996772 #
shagie ◴[] No.45996772[source]
Make up a fanciful problem and ask it to solve it. For example, https://chatgpt.com/s/t_691f6c260d38819193de0374f090925a is unlikely to be found in the training data - I just made it up. Another example of wizards and witches and warriors and summoning... https://chatgpt.com/share/691f6cfe-cfc8-8011-b8ca-70e2c22d36... - I doubt that was in the training data either.

Make up puzzles of your own and see if it is able to solve it or not.

The blanket claim of "cannot solve problems that are not in its training data" seems to be something that can be disproven by making up a puzzle from your own human creativity and seeing if it can solve it - or for that matter, how it attempts to solve it.

It appears that there is some ability for it to reason about new things. I believe that much of this "an LLM can't do X" or "an LLM is parroting tokens that it was trained on" comes from trying to claim that all the material that it creates was created before, by a human and any use of an LLM is stealing from some human and thus unethical to use.

( ... and maybe if my block world or wizards and warriors and witches puzzle was in the training data somewhere, I'm unconsciously copying something somewhere else and my own use of it is unethical. )

replies(2): >>45997533 #>>45999185 #
wadadadad ◴[] No.45997533[source]
This is an interesting idea, but as you stated, it's all logic; it's hard to come up with an idea where you don't have to explain concepts yet still is dissimilar enough to be in the training.

In your second example with the wizards- did you notice that it failed to follow the rules? Step 3, the witch was summoned by the wizard. I'm curious as to why you didn't comment either way on this.

On a related note, instead of puzzles, what about presenting riddles? I would argue that riddles are creative, pulling bits and pieces of meaning from words to create an answer. If AI can solve riddles not seen before, would that count as creative and not solving problems in their dataset?

Here's one I created and presented (the first incorrect answer I got was Escape Room; I gave it 10 attempts and it didn't get the answer I was thinking of):

---

Solve the riddle:

Chaos erupts around

The shape moot

The goal is key

replies(1): >>45999534 #
shagie ◴[] No.45999534[source]
The challenge is: for someone who is convinced that an LLM is only presenting material that they've seen before that was created by some human, how do you show them something that hasn't been seen before?

(Digging in old chats one from 2024 this one is amusing ... https://chatgpt.com/share/af1c12d5-dfeb-4c76-a74f-f03f48ce3b... was a fun one - epic rap battle between Paul Graham and Commander Taco )

Many people seem to believe that the LLM is not much more than a collage of words that it stole from other places and likewise images are a collage of images stolen from other people's pictures. (I've had people on reddit (which tends to be rather AI hostile outside of specific AI subs) downvote me for explaining how to use an LLM as an editor for your own writing or pointing out that some generative image systems are built on top of libraries where the company had rights (e.g. stock photography) to all the images)

With the wizards, I'm not interested necessarily in the correct solution, but rather how it did it and what the representation of the response was. I selected everything with 'W' to see how it handled identifying the different things.

As to riddles... that's really a question of mind reading. Your riddle isn't one that I can solve. Maybe if you told me the answer I'd understand how you got from the answer to the question, but I've got no idea how to go from the hint to a possible answer (does that make me an LLM?)

I feel its a question much more along some other classic riddles...

    “What have I got in my pocket?" he said aloud. He was talking to himself, but Gollum thought it was a riddle, and he was frightfully upset. "Not fair! not fair!" he hissed. "It isn't fair, my precious, is it, to ask us what it's got in its nassty little pocketsess?”
What do I have in my pocket? (and then a bit of "what would it do with that prompt?") https://chatgpt.com/s/t_691fa7e9b49081918a4ef8bdc6accb97

At this point, I'm much more of the opinion that some people are on "team anti-ai" and that it has become part of their identity to be against anything that makes use of AI to augment what a human can do unaided. Attempting to show that it's not a stochastic parrot or next token predictors (anymore than humans are) or that it can do things that help people (when used responsibly by the human) gets met with hostility.

I believe that this comes from the group identity and some of the things of group dynamics. https://gwern.net/doc/technology/2005-shirky-agroupisitsownw...

> The second basic pattern that Bion detailed is the identification and vilification of external enemies. This is a very common pattern. Anyone who was around the open source movement in the mid-1990s could see this all the time. If you cared about Linux on the desktop, there was a big list of jobs to do. But you could always instead get a conversation going about Microsoft and Bill Gates. And people would start bleeding from their ears, they would get so mad.

> ...

> Nothing causes a group to galvanize like an external enemy. So even if someone isn’t really your enemy, identifying them as an enemy can cause a pleasant sense of group cohesion. And groups often gravitate toward members who are the most paranoid and make them leaders, because those are the people who are best at identifying external enemies.

replies(1): >>46005273 #
wadadadad ◴[] No.46005273[source]
I don't think riddles are necessarily 'solvable' in that there's only one right answer; the very fact that they're open to interpretation, but when you get the 'right' answer it (hopefully) makes sense. So if an AI/LLM can answer such a nebulous thing correctly- that's more of the angle I was going at.

Regarding the wizards example, I'm a bit confused; I was thinking that the best way to judge answers for problem solving/creativity was for correctness. I'll think more on whether the 'thought process' counts in and of itself.

The answer to my riddle is 'ball'.

replies(2): >>46006653 #>>46007854 #
johnisgood ◴[] No.46006653[source]
How did you get "ball" from your riddle? I read it and I have no idea! :(
replies(1): >>46008532 #
1. shagie ◴[] No.46008532[source]
In my sibling comment, I linked the chat session where I prompted ChatGPT for possible answers and reasoning.

https://chatgpt.com/share/6920b9e2-764c-8011-a14a-012e97573f...

    Given the following riddle, identify the object to which it refers.
    #
    Chaos erupts around
    The shape moot
    The goal is key
    #
    Identify 10 different possible answers and identify the reasoning behind each guess and why it may or may not be correct.
The second item in the possible answers:

    Soccer ball
    Why it fits:
        “Chaos erupts around”: Players cluster and scramble around the ball; wherever it goes, chaos follows.
        “The shape moot”: Modern footballs vary in panel design and surface texture, but they must all be broadly spherical; to the game itself, variations in cosmetic shape are mostly irrelevant.
        “The goal is key”: Everyone’s objective is to get the ball into the goal.
    Why it might not be correct:
        The third line emphasizes the goal, which points more strongly to the scoring structure or concept of scoring rather than the ball.
replies(1): >>46035477 #
2. wadadadad ◴[] No.46035477[source]
Interesting that here ChatGPT was able to generally get the correct idea! Two points:

The answer is not specifically 'soccer ball', but just ball. I don't think that I would deem that as acceptable, though certainly it's very close! Maybe others would disagree, haha, and as I stated above, I do think riddles are open to interpretation.

Second, as to why my own prompting didn't get- I didn't specify 'identify the object'. I wonder if prompting that it wasn't necessarily a physical thing was helpful enough to get it significantly closer (still funny that the first answer I received was 'escape room').

As to GP: - in sports with balls, there is 'chaos'. I was aiming more from the audience. In some of the larger arenas of professional sports, there's a complete ruckus on certain actions. - The shape is moot; there's many different kinds of 'balls'. Compare football to soccer to tennis. - Balls all have an objective, a goal, usually to get the ball to a specific location ('goal' in the typical sense, but the vagueness could imply general use as well). This was mostly to imply a sense of purpose and use of the riddle's answer.

Again, not saying this is the best riddle ever, just trying to make a point.