←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 3 comments | | HN request time: 0s | source
Show context
x______________ ◴[] No.45640883[source]

  >先天下之忧而忧
How is this an example of a prompt?

Google translated this to "Worry about the world first" while Bing says "Worry before the worries of the world."

Can anyone shed some light on this saying or why it's in the article?

replies(5): >>45640938 #>>45640942 #>>45640978 #>>45641032 #>>45641879 #
gudzpoz ◴[] No.45641032[source]
This clause is usually used together with the next sentence in the original poem:

> 先天下之忧而忧,后天下之乐而乐

> (put the world's worries before yours, and put your happiness after the world's) > edit: this translation is wrong, and raincole has a definitely better translation

Since the model is a language model, they probably use this to demonstrate the model's language capabilities – the model should be able to complete the whole sentence pair. The paper also mentions this:

> To ensure the model’s language capabilities, we introduced 10% of in-house text-only pretrain data.

So I believe it is just a text-only demonstration.

replies(1): >>45641332 #
jdthedisciple ◴[] No.45641332[source]
Sibling comment has the second part as

後天下之樂而樂

Which one is correct?

replies(1): >>45642160 #
numpad0 ◴[] No.45642160[source]

  a) 后天下之乐而乐
  b) 後天下之樂而樂
  c) 後天下之楽而楽
a) is clearly Simplified Chinese from a sibling comment, b) is Traditional copied from your comment, and c) is as I just typed in my own language. Unicode Hanzi/Kanji are a mess and there are characters same or different, in appearance or in binary, depending on intended variants, languages, fonts, systems, keyboard, distance between Earth and Alpha Centauri, etc.
replies(2): >>45643372 #>>45652329 #
1. jdthedisciple ◴[] No.45643372[source]
Fascinating! That's exactly why I asked, so thank you.

Do people usually recognize all variants as valid and legible? Or does any particular set of letters/symbols prevail in practice?

replies(2): >>45643936 #>>45644871 #
2. hank2000 ◴[] No.45643936[source]
Very location dependent. But when you learn to write the characters you understand the variants differently. They look like random strokes to an untrained eye. But they’re not. I’m not sure if that makes sense.

Take a lowercase a in English for example. This font writes it differently than a child. Or in cursive. Or probably than you would write it. But you recognize all of them and don’t really think about it.

3. numpad0 ◴[] No.45644871[source]
Traditional kinds are usually recognizable, but I'd be unsure or straight up wrong about most Simplified versions. Overall proportions and small details often feel "wrong" for both as well due to cultures converging at different points.