Most active commenters
  • richhickey(6)
  • alankay(5)

←back to thread

1401 points alankay | 11 comments | | HN request time: 1.327s | source | bottom

This request originated via recent discussions on HN, and the forming of HARC! at YC Research. I'll be around for most of the day today (though the early evening).
Show context
wdanilo ◴[] No.11941656[source]
Hi Alan! I've got some assumptions regarding the upcoming big paradigm shift (and I believe it will happen sooner than later):

1. focus on data processing rather than imperative way of thinking (esp. functional programming)

2. abstraction over parallelism and distributed systems

3. interactive collaboration between developers

4. development accessible to a much broader audience, especially to domain experts, without sacrificing power users

In fact the startup I'm working in aims exactly in this direction. We have created a purely functional visual<->textual language Luna ( http://www.luna-lang.org ).

By visual<->textual I mean that you can always switch between code, graph and vice versa.

What do you think about these assumptions?

replies(2): >>11942789 #>>11945722 #
alankay ◴[] No.11945722[source]
What if "data" is a really bad idea?
replies(3): >>11945869 #>>11956981 #>>11984719 #
richhickey ◴[] No.11945869[source]
Data like that sentence? Or all of the other sentences in this chat? I find 'data' hard to consider a bad idea in and of itself, i.e. if data == information, records of things known/uttered at a point in time. Could you talk more about data being a bad idea?
replies(2): >>11946532 #>>11948698 #
alankay ◴[] No.11946532[source]
What is "data" without an interpreter (and when we send "data" somewhere, how can we send it so its meaning is preserved?)
replies(3): >>11946764 #>>11957966 #>>11959640 #
richhickey ◴[] No.11946764[source]
Data without an interpreter is certainly subject to (multiple) interpretation :) For instance, the implications of your sentence weren't clear to me, in spite of it being in English (evidently, not indicated otherwise). Some metadata indicated to me that you said it (should I trust that?), and when. But these seem to be questions of quality of representation/conveyance/provenance (agreed, important) rather than critiques of data as an idea. Yes, there is a notion of sufficiency ('42' isn't data).

Data is an old and fundamental idea. Machine interpretation of un- or under-structured data is fueling a ton of utility for society. None of the inputs to our sensory systems are accompanied by explanations of their meaning. Data - something given, seems the raw material of pretty much everything else interesting, and interpreters are secondary, and perhaps essentially, varied.

replies(2): >>11946935 #>>11946989 #
alankay ◴[] No.11946935[source]
There are lots of "old and fundamental" ideas that are not good anymore, if they ever were.

The point here is that you were able to find the interpreter of the sentence and ask a question, but the two were still separated. For important negotiations we don't send telegrams, we send ambassadors.

This is what objects are all about, and it continues to be amazing to me that the real necessities and practical necessities are still not at all understood. Bundling an interpreter for messages doesn't prevent the message from being submitted for other possible interpretations, but there simply has to be a process that can extract signal from noise.

This is particularly germane to your last paragraph. Please think especially hard about what you are taking for granted in your last sentence.

replies(5): >>11947046 #>>11947082 #>>11947341 #>>11947809 #>>11958689 #
richhickey ◴[] No.11947809[source]
Without the 'idea' of data we couldn't even have a conversation about what interpreters interpret. How could it be a "really bad" idea? Data needn't be accompanied by an interpreter. I'm not saying that interpreters are unimportant/uninteresting, but they are separate. Nor have I said or implied that data is inherently meaningful.

Take a stream of data from a seismometer. The seismometer might just record a stream of numbers. It might put them on a disk. Completely separate from that, some person or process, given the numbers and the provenance alone (these numbers are from a seismometer), might declare "there is an earthquake coming". But no object sent an "earthquake coming" "message". The seismometer doesn't "know" an earthquake is coming (nor does the earth, the source of the 'messages' it records), so it can't send a "message" incorporating that "meaning". There is no negotiation or direct connection between the source and the interpretation.

We will soon be drowning in a world of IoT sensors sending context-or-provenance-tagged but otherwise semantic-free data (necessarily, due to constraints, without accompanying interpreters) whose implications will only be determined by downstream statistical processing, aggregation etc, not semantic-rich messaging.

If you meant to convey "data alone makes for weak messages/ambassadors", well ok. But richer messages will just bottom out at more data (context metadata, semantic tagging, all more data) Ditto, as someone else said, any accompanying interpreter (e.g. bytecode? - more data needing interpretation/execution). Data remains a perfectly useful and more fundamental idea than "message". In any case, I thought we were talking about data, not objects. I don't think there is a conflict between these ideas.

replies(1): >>11948729 #
1. alankay ◴[] No.11948729[source]
2nd Paragraph: How do they know they are even bits? How do they know the bits are supposed to be numbers? What kind of numbers? Relating to what?

Etc

replies(2): >>11949133 #>>11956853 #
2. richhickey ◴[] No.11949133[source]
It contravenes the common and historical use of the word 'data' to imply undifferentiated bits/scribbles. It means facts/observations/measurements/information and you must at least grant it sufficient formatting and metadata to satisfy that definition. The fact that most data requires some human involvement for interpretation (e.g. pointing the right program at the right data) in no way negates its utility (we've learned a lot about the universe by recording data and analyzing it over the centuries), even though it may be insufficient for some bootstrapping system you envision.
replies(3): >>11957323 #>>11957898 #>>11958852 #
3. ◴[] No.11956853[source]
4. jonathanlocke ◴[] No.11957323[source]
I think in the Science of Process that is being related as a desirable goal, everything would necessarily be a dynamic object (or perhaps something similar to this but fuzzier or more relational or different in some other way, but definitely dynamic) because data by itself is static while the world itself is not.
5. mmiller ◴[] No.11957898[source]
I think what Alan was getting at is that what you see as "data" is in fact, at its basis, just signal, and only signal; a wave pattern, for example, but even calling it a "wave pattern" suggests interpretation. What I think he's trying to get across is there is a phenomenon being generated by something, but it requires something else--an interpreter--to even consider it "data" in the first place. As you said, there are multiple ways to interpret that phenomenon, but considering "data" as irreducible misses that point, because the concept of data requires an interpreter to even call it that. Its very existence as a concept from a signal presupposes an interpretation. And I think what he might have been getting at is, "Let's make that relationship explicit." Don't impose a single interpretation on signal by making "data" irreducible. Expose the interpretation by making it explicit, along with the signal, in how one might design a system that persists, processes, and transmits data.
replies(1): >>11962116 #
6. jsprogrammer ◴[] No.11958852[source]
Your selection of data is arbitrary.

Not only is your perception based on an interpreter, but how can you be sure that you were even given all of the relevant bits? Or, even what the bits really meant/are?

replies(1): >>11978703 #
7. richhickey ◴[] No.11962116{3}[source]
If we can't agree on what words mean we can't communicate. This discussion is undermined by differing meanings for "data", to no purpose. You can of course instead send me a program that (better?) explains yourself, but I don't trust you enough to run it :)

The defining aspect of data is that it reflects a recording of some facts/observations of the universe at some point in time (this is what 'data' means, and meant long before programmers existed and started applying it to any random updatable bits they put on disk). A second critical aspect of data is that it doesn't and can't do anything, i.e. have effects. A third aspect is that it does not change. That static nature is essential, and what makes data a "good idea", where a "good idea" is an abstraction that correlates with reality - people record observations and those recordings (of the past) are data. Other than in this conversation apparently, if you say you have some data, I know what you mean (some recorded observations). Interpretation of those observations is completely orthogonal.

Nothing about the idea of 'data' implies a lack of formatting/labeling/use of common language to convey the facts/observations, in fact it requires it. Data is not merely a signal and that is why we have two different ideas/words. '42' is not, itself, a fact (datum). What constitutes minimal sufficiency of 'data' is a useful and interesting question. E.g. should data always incorporate time, what are the tradeoffs of labeling being in- or out-of-band, per datom or dataset, how to handle provenance etc. That has nothing to do with data as an idea and everything to do with representing data well.

But equating any such labeling with more general interpretation is a mistake. For instance, putting facts behind a dynamic interpreter (one that could answer the same question differently at different times, mix facts with opinions/derivations or have effects) certainly exceeds (and breaks) the idea of data. Which is precisely why we need the idea of data, so we can differentiate and talk about when that is and is not happening - am I dealing with facts, an immutable observation of the past ("the king is dead") or just a temporary (derived) opinions ("there may be a revolt"). Consider the difference between a calculation involving (several times) a fact (date-of-birth) vs a live-updated derivation (age). The latter can produce results that don't add up. 'date-of-birth' is data and 'age' (unless temporally-qualified, 'as-of') is not.

When interacting with an ambassador one may or may not get the facts, and may get different answers at different times. And one must always fear that some question you ask will start a war. Science couldn't have happened if consuming and reasoning about data had that irreproducibility and risk.

'Data' is not a universal idea, i.e. a single primordial idea that encompasses all things. But the idea that dynamic objects/ambassadors (whatever their other utility) can substitute for facts (data) is a bad idea (does not correspond to reality). Facts are things that have happened, and things that have happened have happened (are not opinions), cannot change and cannot introduce new effects. Data/facts are not in any way dynamic (they are accreting, that's all). Sometimes we want the facts, and other times we want someone to discuss them with. That's why there is more than one good idea.

Data is as bad an idea as numbers, facts and record keeping. These are all great ideas that can be realized more or less well. I would certainly agree that data (the maintenance of facts) has been bungled badly in programming thus far, and lay no small part of the blame on object- and place-oriented programming.

replies(1): >>11963216 #
8. pefimo ◴[] No.11963216{4}[source]
Why do you limit the meaning of 'data' to facts and/or observations?
replies(1): >>11963502 #
9. richhickey ◴[] No.11963502{5}[source]
"datum" means "a thing given" - a fact or presumed fact.

http://www.dictionary.com/browse/datum

10. madmax96 ◴[] No.11978703{3}[source]
Of course the selection of data is arbitrary -- but Rich gives us a definition, which he makes abundantly clear and uses consistently. All definitions can be considered arbitrary. He's not making any claim that we have all the relevant bits of data or that we can be sure what the data really means or represents.

But we can expound on this problem in general. In any experiment where we gather data, how can we be sure we have collected a sufficient quantity to justify conclusions (and even if we are using statistical methods that our underlying assumptions are indeed consistent with reality) and that we have accrued all the necessary components? What you're really getting at is an __epistemological__ problem.

My school of thought is that the only way to proceed is to do our best with the data we have. We'll make mistakes, but that's better than the alternative (not extrapolating on data at all.)

replies(1): >>11984230 #
11. jsprogrammer ◴[] No.11984230{4}[source]
I hope we can do our best, I'm just not sure there is really a satisfactory way to define/measure/judge that we have actually done so....