Most active commenters
  • (3)

82 points mfiguiere | 33 comments | | HN request time: 1.455s | source | bottom
1. elahieh ◴[] No.41084133[source]
Section header: "Who's data is it?"

Couldn't they have had an LLM proof-read they're paper?

replies(3): >>41084248 #>>41084339 #>>41084378 #
2. ziofill ◴[] No.41084248[source]
They’re?
replies(1): >>41084470 #
3. ◴[] No.41084262[source]
4. batch12 ◴[] No.41084339[source]
I've been seeing this a lot lately. Everywhere from screen printed signs to news tickers. I'm not sure if this is because it's new or if I'm just now seeing it.
replies(1): >>41084445 #
5. thierrydamiba ◴[] No.41084378[source]
I think it’s a play on “who’s line is it”
replies(3): >>41084409 #>>41084415 #>>41084763 #
6. ◴[] No.41084409{3}[source]
7. dools ◴[] No.41084415{3}[source]
But whose line is it anyway is spelled whose because it’s a possessive pronoun. Who’s means who is or who has.

Who’s up for some donuts?

Whose turn is it to go to the shops?

Who’s seen my car keys?

replies(1): >>41104221 #
8. elahieh ◴[] No.41084445{3}[source]
Yeah, call me snobby or OCD but I lost confidence in the authors and reviewers when I saw that slipped through.

Then I started wondering if this is going to become the new anti-AI marker. AI-written papers use "delves", "underscores" and "showcasing" too much. Avoid those words, throw in some errors and readers will think your paper was written by humans.

replies(3): >>41084453 #>>41085045 #>>41096457 #
9. kolinko ◴[] No.41084453{4}[source]
As for authors - perhaps english is their second language? Regular spellcheckers don’t check grammar well enough.
replies(1): >>41084467 #
10. rjurney ◴[] No.41084467{5}[source]
You fucks, I just learned about this. I feel so stupid!
11. marktam264 ◴[] No.41084470{3}[source]
I know, right? Their grammar is, as if, they’re not there?
replies(1): >>41084475 #
12. albert_e ◴[] No.41084495[source]
Possibly stupid question, in general: why don't these papers have any dates? how do I know when it was published and whether it is up-to-date / still relevant?
replies(2): >>41084551 #>>41084592 #
13. eftychis ◴[] No.41084551[source]
Not a stupid question at all. I agree with you. But that is part of the style of VLDB -- the front matter has the publication date.

In any case this is part of the 2022-2023 volume: https://www.vldb.org/pvldb/volumes/16/

(in particular https://www.vldb.org/pvldb/vol16/FrontMatterVol16No11.pdf -- July 2023)

replies(1): >>41084591 #
14. cgio ◴[] No.41084589[source]
This felt very shallow which may be because of the breadth of topics the authors tried to address. Data management has been building for years on the pillars of data quality, with a passion that is at times counterproductive (looking at you single ideal schema of truth) and I feel this failed to put enough emphasis on the gradual transition to new mechanics of trust as a counterweight to probabilistic answers. We are falling into the trap of imagining robotic brooms instead of vacuum cleaners. I don't see LLMs perfecting approaches with singular focus on precision but I do see them introducing new with focus on convenience (is that not what we are witnessing with evolution of search?)
replies(1): >>41084612 #
15. simonw ◴[] No.41084591{3}[source]
Oh so this is a year old? That makes sense, some of the examples (like the SQL generation from human language) felt a little unexciting by today's standards but would have been more interesting last July.
16. zeehio ◴[] No.41084592[source]
I agree they should shave that information.

The nice part is that this information is standardized through the digital object identifier (doi).

Ex: 10.14778/3611479.3611527

This doi takes you to:

https://dl.acm.org/doi/10.14778/3611479.3611527

In general if you have a DOI and you want a URL you can go to https://dx.doi.org/ to resolve it.

replies(1): >>41085174 #
17. crooked-v ◴[] No.41084612[source]
> trust as a counterweight to probabilistic answers

The basic problem with that is that even with 5 9s of good results, that remainder still gets you very public cases of "put glue on your pizza". And we're not even at 5 9s yet.

replies(1): >>41085700 #
18. lolive ◴[] No.41084734[source]
Nonsense. *BLOCKCHAIN* will disrupt data management !

#humor

19. serial_dev ◴[] No.41084763{3}[source]
Its still whose, sorry to tell you, your misstaken
replies(2): >>41085212 #>>41088968 #
20. serial_dev ◴[] No.41084786[source]
I work at a company that manages docs, chat, and tasks (amongst many others), we of course use our product internally. AI search (chatgpt-like, you ask a question, it answers) was added a while ago.

My experience has been that it really is a huge improvement, you don't need to guess which words were used to describe the issue, you just describe your issue, tell the system what you want, and the results are there. Chat's been busy and you remember an issue was raised in a thread a week ago? With traditional search, good luck finding it. Now, I just write "Tom raised this week an issue in this chat about something not working with reminders??" and the results are there.

Someone at the company uses it to manage Dungeons and Dragons play nights, they document their world, their plays, etc, he wrote probably a smaller book worth of content, then can ask AI what happened 4 months ago with a characters instead of trying to search.

replies(1): >>41086162 #
21. 6510 ◴[] No.41085045{4}[source]
"unsafe" language is your friend.
22. sulandor ◴[] No.41085174{3}[source]
doi is indeed very nice

one can even make a get_pdf_by_doi() by employing curl and sci-hub

replies(1): >>41086710 #
23. ◴[] No.41085212{4}[source]
24. zcw100 ◴[] No.41085700{3}[source]
People assume this is a completely absurd answer but there are times when it is correct to put glue on a pizza. The LLM picked this up because advertisers put glue on pizza to get that cheese pull shot. The next question is will LLMs ever develop enough understanding to distinguish these two cases. Personally I think LLMs will figure this out long before a bunch of ontologists stop bickering about it.
replies(2): >>41086513 #>>41086550 #
25. CaptainFever ◴[] No.41086162[source]
I wish something like that came to Discord. Their search is terrible and I could never find something if I didn't know the exact words, and the servers aren't indexed by external search engines either.
26. Obscurity4340 ◴[] No.41086513{4}[source]
> when ontologists stop bickering about it

When does the LLM think that will happen, please ask it I'm curious

27. dartos ◴[] No.41086550{4}[source]
Google’s AI pulled that answer almost word for word from a joke on Reddit.

Nothing about filming food commercials. Why white knight an undercooked product?

28. abrichr ◴[] No.41086710{4}[source]
https://gist.github.com/abrichr/455f0e569bf1bd104c696a7ad9e6...
replies(1): >>41098565 #
29. ohthatsnotright ◴[] No.41088968{4}[source]
Their mistaken what?
30. treebeard901 ◴[] No.41096457{4}[source]
Their probably just unaware of the affect they're words wood have on us. Its no big deal and we should just except it. I wouldn't altar a single word. They've been served there just deserts, and I would of maid the same mistake. Let sleeping dogs lay. They probably never past English class anyway, and your far two picky about these things.
31. sulandor ◴[] No.41098565{5}[source]
you got the idea

   get_pdf_by_doi(){ wget --recursive --span-hosts --no-directories --accept '*.pdf' --quiet --execute robots=off https://sci-hub.tw/${1} ;}
32. Asraelite ◴[] No.41104221{4}[source]
In this context it's a possessive determiner, not a possessive pronoun.