Most active commenters
  • jdthedisciple(3)

←back to thread

DeepSeek OCR

(github.com)
990 points pierre | 16 comments | | HN request time: 0s | source | bottom
1. x______________ ◴[] No.45640883[source]

  >先天下之忧而忧
How is this an example of a prompt?

Google translated this to "Worry about the world first" while Bing says "Worry before the worries of the world."

Can anyone shed some light on this saying or why it's in the article?

replies(5): >>45640938 #>>45640942 #>>45640978 #>>45641032 #>>45641879 #
2. fspeech ◴[] No.45640938[source]
Google is closer. This is from a famous essay expressing tbe author's desire to bear the burden for the world. Essay is 岳阳楼记 by 范仲淹 in year 1046 https://zh.wikisource.org/zh-hans/%E5%B2%B3%E9%99%BD%E6%A8%9...
3. SequoiaHope ◴[] No.45640942[source]
Ask a language model - ChatGPT says it’s a line from a famous poem “Memorial to Yueyang Tower” which expresses the Confucian ideal of selfless concern for people and society.
4. raincole ◴[] No.45640978[source]
It's a very famous (classical) Chinese phrase.

Both translations don't catch the meaning well though. It means: "worry before the rest of the world (notice that they have something to) worry." The next part is 後天下之樂而樂("be happy only after the rest of the world is happy.")

I don't know why it's a prompt example.

replies(1): >>45641328 #
5. gudzpoz ◴[] No.45641032[source]
This clause is usually used together with the next sentence in the original poem:

> 先天下之忧而忧,后天下之乐而乐

> (put the world's worries before yours, and put your happiness after the world's) > edit: this translation is wrong, and raincole has a definitely better translation

Since the model is a language model, they probably use this to demonstrate the model's language capabilities – the model should be able to complete the whole sentence pair. The paper also mentions this:

> To ensure the model’s language capabilities, we introduced 10% of in-house text-only pretrain data.

So I believe it is just a text-only demonstration.

replies(1): >>45641332 #
6. jdthedisciple ◴[] No.45641328[source]
Sibling comment has the second part as

后天下之乐而乐

which one is correct?

replies(2): >>45641361 #>>45642155 #
7. jdthedisciple ◴[] No.45641332[source]
Sibling comment has the second part as

後天下之樂而樂

Which one is correct?

replies(1): >>45642160 #
8. raincole ◴[] No.45641361{3}[source]
Traditional vs Simplified Chinese.

There are two (modern) "spellings" of written Chinese. Basically colour vs color.

9. ◴[] No.45641879[source]
10. Y_Y ◴[] No.45642155{3}[source]
It depends on who you think is the rightful successor to the Qing dynasty
replies(1): >>45652403 #
11. numpad0 ◴[] No.45642160{3}[source]

  a) 后天下之乐而乐
  b) 後天下之樂而樂
  c) 後天下之楽而楽
a) is clearly Simplified Chinese from a sibling comment, b) is Traditional copied from your comment, and c) is as I just typed in my own language. Unicode Hanzi/Kanji are a mess and there are characters same or different, in appearance or in binary, depending on intended variants, languages, fonts, systems, keyboard, distance between Earth and Alpha Centauri, etc.
replies(2): >>45643372 #>>45652329 #
12. jdthedisciple ◴[] No.45643372{4}[source]
Fascinating! That's exactly why I asked, so thank you.

Do people usually recognize all variants as valid and legible? Or does any particular set of letters/symbols prevail in practice?

replies(2): >>45643936 #>>45644871 #
13. hank2000 ◴[] No.45643936{5}[source]
Very location dependent. But when you learn to write the characters you understand the variants differently. They look like random strokes to an untrained eye. But they’re not. I’m not sure if that makes sense.

Take a lowercase a in English for example. This font writes it differently than a child. Or in cursive. Or probably than you would write it. But you recognize all of them and don’t really think about it.

14. numpad0 ◴[] No.45644871{5}[source]
Traditional kinds are usually recognizable, but I'd be unsure or straight up wrong about most Simplified versions. Overall proportions and small details often feel "wrong" for both as well due to cultures converging at different points.
15. emptyhandeddev ◴[] No.45652329{4}[source]
a) Simplified Chinese

b) Traditional Chinese

c) 楽 is a variation of 樂, which is now widely used in Japanese Kanji but deprecated in Traditional Chinese.

Note:

A variation means some people write 樂 as 楽 in ancient China, but not widely adopted.

Kanji is a Japanese word, means "Chinese Character".

16. emptyhandeddev ◴[] No.45652403{4}[source]
Wrong. It merely depends on whether the local policy maker before computer age prioritize reducing illiteracy and convenience over other considerations.

Macau, HK and Taiwan uses traditional Chinese character.

Mainland China, Singapore, Malaysia use simplified Chinese character.

Japan uses its own version, some simplified, some traditional, and also invented over 100 Japanese-made-Kanji following the same logic how Chinese characters are formed.

As a matter of fact, simplification of Chinese characters started when KMT/Republic of China was in control of the whole China. Politics gets in the way later and RoC stopped this simplification process while PRC kept it going, Macau & HK were not involved since the Portuguese and British colonial government doesn't care. Singapore and Malaysia pick the simplified version out of convenience.