Just like here you could get a timeline of key events, a graph of connected entities, links to original documents.
Newsrooms might already do this internally idk.
This code might work as a foundation. I love that it's RDF.
Just like here you could get a timeline of key events, a graph of connected entities, links to original documents.
Newsrooms might already do this internally idk.
This code might work as a foundation. I love that it's RDF.
These general data models start to become useful and interesting at around a trillion edges, give or take an order of magnitude. A mature graph model would be at least a few orders of magnitude larger, even if you aggressively curated what went into it. This is a simple consequence of the cardinality of the different kinds of entities that are included in most useful models.
No system described in open source can get anywhere close to even the base case of a trillion edges. They will suffer serious scaling and performance issues long before they get to that point. It is a famously non-trivial computer science problem and much of the serious R&D was not done in public historically.
This is why you only see toy or narrowly focused graph data models instead of a giant graph of All The Things. It would be cool to have something like this but that entails some hardcore deep tech R&D.
That is a wild claim. Perhaps for some very specific definition of "useful and interesting"? This dataset is already interesting (hard to say whether it's useful) at a much tinier scale.
Almost every non-trivial graph data model about the world is a graph of human relationships in the population. If not directly then by proxy. Population scale human relationship graphs commonly pencil out at roughly 1T edges, a function of the population size. It is also typically the highest cardinality entity. Even the purpose isn’t a human relationship graph, they all tend to have one tacitly embedded with the scale implied.
If you restrict the set of human entities, you either end up with big holes in the graph or it is a graph that is not generally interesting (like one limited to company employees).
The OP was talking about generalizing this to a graph of people, places, events, and organizations, which always has this property.
It is similar to the phenomenon that a vast number of seemingly unrelated statistics are almost perfectly correlated with GDP.