States like "human_activity" are not objectively measurable.
Fairly PGMs and causal models are not the same, but this way of thinking about state variables is an incredible good filter.
Even more importantly, the endpoints of each such causative arrow are also complex, fuzzy things, and are best represented as vectors. I.e.: diseases aren't just simple labels like "Influenza". There's thousands of ever-changing variants of just the Flu out there!
A proper representation of a "disease" would be a vector also, which would likely have interesting correlations with the specific genome of the causative agent. [1]
Next thing is that you want to consider the "vector product" between the disease and the thing it infected to cater for susceptibility, previous immunity, etc...
A hop, skip, and a small step and you have... Transformers, as seen in large language models. This is why they work so well, because they encode the complex nuances of reality in a high-dimensional probabilistic causal framework that they can use to process information, answer questions, etc...
Trying to manually encode a modern LLM's embeddings and weights (about a terabyte!) is futile beyond belief. But that's what it would take to make a useful "classical logic" model that could have practical applications.
Notably, expert systems, which use this kind of approach were worked on for decades and were almost total failures in the wider market because they were mostly useless.
[1] Not all diseases are caused by biological agents! That's a whole other rabbit hole to go down.
We have quite a good understanding that a system cannot be both sound a complete, regardless people went straight in to make a single model of the world.
What's perhaps different is that the machine, via LLM's, can also have an 'opinion' on meaning or correctness.
Going fully circle I wonder what would happen if you got LLM's to define the ontology....
So, by design, it's pretty useless for finding new, true causes. But maybe it's useful for something else, such as teaching a model what a causal claim is in a deeper sense? Or mapping out causal claims which are related somehow? Or conflicting? Either way, it's about humans, not about ontological truth.
Which is directly usable knowledge if you are building out a causal graph.
In the meantime, a cause and effect representation isn't limited to only listing one possible effect. A list of alternate disjoint effects, linked to a cause, is also directly usable.
Just as an effect may be linked to different causes. Which if you only know the effect, in a given situation, and are trying to identify cause, is the same problem in reverse time.
A coronavirus isn't "claimed" to cause SARS. Rather, SARS is a name given to the disease cause by a certain coronavirus. Or alternatively, the name SARS-nCov-1 is the name given to the virus which causes SARS. Whichever way you want to see it.
For a more obvious example, saying "influenza virus causes influenza" is a tautology, not a causal relationship. If influenza virus doesn't cause influenza disease, then there is no such thing as an influenza virus.
One quibble, and really mean only one:
> a high-dimensional probabilistic causal framework
Deep learning models aka neural network type models, are not probabilistic frameworks. While we can measure on the outside a probability of correct answers across the whole training set, or any data set, there is no probabilistic model.
Like a Pachinko game, you can measure statistics about it, but the game itself is topological. As you point out very clearly, these models perform topological transforms, not probabilistic estimations.
This becomes clear when you test them with different subsets of data. It quickly becomes apparent that the probabilities of the training set are only that. Probabilities of the exact training set only. There is no probabilistic carry over to any subset, or for generalization to any new values.
They are estimators, approximators, function/relationship fitters, etc. In contrast to symbolic, hard numerical or logical models. But they are not probabilistic models.
Even when trained to minimize a probabilistic performance function, their internal need to represent things topologically creates a profoundly "opinionated" form of solution, as apposed to being unbiased with respect to the probability measure. The measure never gets internalized.
https://deepsense.ai/resource/ontology-driven-knowledge-grap...
>hammering out an ontology for a particular area just results in a common understanding between those who wrote the ontology and a large gulf between them and the people they want to use it
This is the other side of the bitter lesson, which is just the empirical observation of a phenomenon that was to be expected from first principles (algorithmic information theory): a program of minimal length must get longer if the reality it models becomes more complex.
For ontologists, the complexity of the task increases as the generality is maintained while model precision is increased (top down approach), or conversely, when precision is maintained the "glue" one must add to build up a bigger and bigger whole while keeping it coherent becomes more and more complex (bottom up approach).
Honestly, I don’t know understand how these so-ontologies have persisted. Who is investing in this space, and why?
But this description->explanation thing, whatever the reason, is just another error people make. It's not that different from errors like "vaccines cause autism". Any dataset collecting causal claims people make is going to contain a lot of nonsense.
Huh, what do you mean by this? There are many sound and complete systems – propositional logic, first-order logic, Presburger arithmetic, the list goes on. These are the basic properties you want from a logical or typing system. (Though, of course, you may compromise if you have other priorities.)
Not so sure one should take stories about who said something in ancient times at face value ;)
But when a doctor tells the lawyer that they operated a person, the lawyer can reasonably say "huh" - the concept of a person has shifted with the context.
Virgil.
[0] https://en.m.wikipedia.org/wiki/Felix,_qui_potuit_rerum_cogn...
The conclusion may be wrong, but a "bigger system" can be larger than the sum of its constituents. So a system can have functions, give rise to complexity, neither of its subsystems feature. An example would be the thinking brain, which is made out of neurons/cells incapable of thought, which are made out of molecules incapable of reproduction, which are made from atoms incapable of catalyzing certain chemical reactions and so on.
I am very curious on this. In particular, if you are able to split systems into formalized and non formalized, then I thinks there are quite some praise and a central spot in all future history books for you!
It's also worth noting that the parameters (weights and biases) of the model are random variables, technically speaking, and this can be considered probabilistic in nature. The parameter estimates themselves are not random variables, to state the obvious. The estimates are simply numbers.
This is in contrast to just one system that attempts to be sound and complete.
This happens over and over with the relatively new popularization of a theory: the theory is proposed to be the solution to every missing thing in the same rough conceptual vector.
It takes a lot more than just pointing in the general direction of complexity to propose the creation of a complete system, something which with present systems of understanding appears to be impossible.
I meant, the colloquial philosophies and general ontology are not subject of Gödel's work. I think, the forgone expansion is similar to finding evidence for telepathy in the pop-sci descriptions of quantum entanglement. Gödel's theorems cover axiomatic, formal systems in mathematics. To apply it to whatever, you first have to formalize whatever. Otherwise, it's an intuition/speculation, not sound reasoning. At least, that's my understanding.
Further reading: https://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_th...
Random behavior in inputs, or in operations, results in random behavior in the outputs. But there is no statistical expression or characterization that can predict the distribution of one from the other.
You can't say, I want this much distribution in the outputs, so I will add this much distribution to the inputs, weights or other operational details.
Even if you create an exhaustive profile of "temperature" and output distributions across the training set, it will only be true for exactly that training set, on exactly that model, for exactly those random conditions. And will vary significantly and unpredictably across subsets of that data, or new data, and different random numbers injected (even with the same random distribution!).
Statistics are a very specific way to represent a very narrow kind of variation, or for a system to produce variation. But lots of systems with variation, such as complex chaotic systems, or complex nonlinear systems (as in neural models!) can defy robust or meaningful statistical representations or analysis.
(Another way to put this, is you can measure logical properties about any system. Such as if an output is greater than some threshold, or if two outputs are equal. The logical measurements can be useful, but that doesn't mean it is a logical system.
Any system with any kind of variation can have potentially useful statistical type measurements done on it. Any deterministic system can have randomness injected to create randomly varying output. But neither of those situations and measurements makes the system a statistically based system.)
> Ontologies and all that have been tried and have always been found to be too brittle.
I'd invite you to look at ontologies as nothing more than representations of things we know in some text-based format. If you've ever written an if statement, used OOP, trained a decision tree, or sketched an ER diagram, you've also represented known things in a particular text-based format.
We probably can agree that all these things are ubiquitous and provide value. It's just that those representations are not serialized as OWL/RDF, claim less about being accurate models of real-world things, and are often coupled with other things (i.e., functions).
This may seem reductionist in the sense of "we're all made of atoms", but I think it's important to understand why ontologies as a concept stick: they provide atomic components for expressing any knowledge in a dedicated place, and reasoning about it. Maybe the serializations, engines, results or creators suck, or maybe codebase + database is enough for most needs, but it's hard to not see the value of having some deterministic knowledge about a domain.
If you take _ontology_ to mean OWL/RDF, this paper wouldn't qualify, so I'm assuming you took the broader meaning (i.e., _semantic triples_).
> Take the examples from the front page (which I expect to be among the best in their set)
Most scientific work will be in-progress, not WordNet-level (which also needs a lot of funding to get there). You ideally want to show a very simple example, and then provide representative examples that signal the level of quality that other contributors/scientists can expect.
Here, they're explicit about creating triples of whatever causal statements they found on Wikipedia. I wouldn't expect it to be immediately useful to me, unless I dedicate time to prune and iron out things of interest.
> human_activity => climate change. Those are such a broad concepts that it's practically useless.
Disagree. If you had one metric that aggregated different measurements of climate change-inducing human activity, and one metric that did the same for climate change, you could create some predictions about N-order effects from climate change. Statistical analysis anyway requires you to make an assumption about the causal relationship behind what you're investigating.
So, if this the level of detail you need, this helps you potentially find new hypotheses just based on Nth order causal relations in Wikipedia text. It's also valuable to show where there is not enough detail.
> Or disease => death. There's no nuance at all.
Aside from my point above - haven't looked at the source data, but I doubt it stops at that level. But even if it does, it's 11 million things with provenance you can play with or add detail to.
Or you can also show that your method or choice of source data gets more conceptual/causal detail out of Wikipedia, or that their approach isn't replicable, or that they did a bad job, etc. These are all very useful contributions.
That's because we know how to interpret the concepts used in these representations, in relation to each other. It's just a syntactic change.
You might have a point if it's used as a kind of search engine: "show me wikipedia articles where X causes Y?" (although there is at least one source besides wikipedia, but you get my drift).
> Aside from my point above - haven't looked at the source data, but I doubt it stops at that level.
It does. It isn't even a triple, it's a pair: (cause, effect). There's no other relation than "causes". And if I skimmed the article correctly, they just take noun phrases and slap an underscore between the words and call it a concept. There's no meaning attached to the labels.
But the higher-order causations you mention are going to be pretty useless if there's no way on how to interpret them. It'll only work for highly specialized, unambiguous concepts, like myxomatosis (which is akin to encoding knowledge in the labels themselves), and the broad nature of many of the concepts will lead to quickly decaying usefulness when the length of the path increases. Here are some random examples (length 4 and 8, no posterior selection) from their "precision" set (197k pairs):
['mistake', 'deaths', 'riots', 'violence']
['higher_operating_income', 'increase_in_operating_income', 'increase_in_net_income', 'increase']
['mail_delivery', 'delays', 'decline_in_revenue', 'decrease']
['wastewater', 'environmental_problems', 'problems', 'treatment']
['sensor', 'alarm', 'alarm', 'alarm']
['thatch', 'problems', 'cost_overruns', 'project_delays']
['smoking_pot', 'lung_cancer', 'shortness_of_breath', 'conditions']
['older_medications', 'side_effects', 'physical_damage', 'loss']
['less_fat', 'weight_loss', 'death', 'uncertainties']
['diesel_particles', 'cancer', 'damages', 'injuries']
['malfunction_in_the_heating_unit', 'fire', 'fire_damage', 'claims']
['drug-resistant_malaria', 'deaths', 'violence', 'extreme_poverty']
['fairness_in_circumstances', 'stress', 'backache', 'aching_muscles']
['curved_spine', 'back_pain', 'difficulties', 'stress', 'difficulties', 'delay', 'problem', 'serious_complications']
['obama', 'high_gas_prices', 'recession', 'hardship', 'happiness', 'success', 'promotions', 'bonuses']
['financial_devastation', 'bankruptcy', 'stigma', 'homelessness', 'health_problems', 'deaths', 'pain', 'quality_of_life']
['methylmercury', 'neurological_damage', 'seizures', 'changes', 'crisis', 'growth', 'problems', 'birth_defects']
The latter is probably correct, but the chain of reasoning is false...This one is cherry-picked, but I found it to funny to omit:
['agnosticism', 'despair', 'feelings', 'aggression', 'action', 'riot', 'arrest', 'embarrassment', 'problems', 'black_holes']
first-order logic is sound, but not complete (Ie. I can express a set of strings you can not recognize in first-order logic).