Most active commenters
  • tehlike(4)
  • saeedesmaili(3)

←back to thread

524 points noperator | 32 comments | | HN request time: 1.364s | source | bottom
1. saeedesmaili ◴[] No.44490550[source]
After reading this I realized I also have an archive of my pocket account (4200 items), so tried the same prompt with o3, gemini 2.5 pro, and opus 4:

- chatgpt UI didn't allow me to submit the input, saying it's too large. Although it was around 80k tokens, less than o3's 200k context size.

- gemini 2.5 pro: worked fine for personality and interest related parts of the profile, but it failed the age range, job role, location, parental status with incorrect perdictions.

- opus 4: nailed it and did a more impressive job, accurately predicted my base city (amsterdam), age range, relationship status, but didn't include anything about if I'm a parent or not.

Both gemini and opus failed in predicting my role, probably understandably. Although I'm a data scientist, I read a lot about software engineering practices because I like writing software and since I don't have the opportunity at work to do this kind of work, I code for personal projects, so I need to learn a lot about system design, etc. Both models thought I'm a software engineer.

Overall it was a nice experiment. Something I noticed is both models mentioned photography as my main hobby, but if they had access to my youtube watch history, they'd confidently say it's tennis. For topics and interests that we usually watch videos rather than reading articles about, would be interesting to combine the youtube watch history with this pocket archive data (although it would be challenging to get that data).

replies(9): >>44490818 #>>44490825 #>>44491013 #>>44491019 #>>44492764 #>>44493027 #>>44495207 #>>44499820 #>>44501925 #
2. greenavocado ◴[] No.44490818[source]
You need to use an iterative refinement pyramid of prompts. Use a cheap model to condense the majority of the raw data in chunks, then increasingly stronger and more expensive models over increasingly larger sets of those chunks until you are able to reach the level of summarization you desire.
3. tgtweak ◴[] No.44490825[source]
I think a reasoning/thinking-heavy model would do better at piecing together the various data points than an agentic model. Would be interested to see how o3 does with the context summarized.
replies(1): >>44493222 #
4. tehlike ◴[] No.44491013[source]
You should take this as a sign, and shoot for SWE jobs - given your interest.

What you do at work today doesn't mean you can't switch to a related ladder.

replies(2): >>44491635 #>>44493295 #
5. juliendorra ◴[] No.44491019[source]
You should be able to use Google Takeout to get all of your YouTube data, including your watch history.

This article is a nice example of someone using it:

> When I downloaded all my YouTube data, I’ve noticed an interesting file included. That file was named watch-history and it contained a list of all the videos I’ve ever watched.

https://blog.viktomas.com/posts/youtube-usage/

Of course as an European it's a legal obligation for companies to give you access, but I think Google Takeout works worldwide?

replies(3): >>44491293 #>>44498307 #>>44499423 #
6. jazzyjackson ◴[] No.44491293[source]
Yes I've done this in USA. pretty neat. I have it on my todo list to parse over it and find all the music videos I've watched 3 or more times to archive them.
replies(1): >>44491975 #
7. justusthane ◴[] No.44491635[source]
Sometimes it’s nice for hobbies to remain hobbies
replies(4): >>44493877 #>>44495336 #>>44495561 #>>44497804 #
8. toomuchtodo ◴[] No.44491975{3}[source]
https://archive.zhimingwang.org/blog/2014-11-05-list-youtube... might be of use along with https://github.com/yt-dlp/yt-dlp, might just grab it all and prune later due to rot and availability issues over time within YT.
9. LoganDark ◴[] No.44492764[source]
> Both models thought I'm a software engineer.

You probably still are, even if that's not your career path :)

10. larve ◴[] No.44493027[source]
re o3: you can zip the file, upload it, and it will use python and grep and the shell to inspect it. I have yet to try using it with a sqlite db, but that's how i do things locally with agents.
replies(1): >>44493330 #
11. saeedesmaili ◴[] No.44493222[source]
Agreed, that's why I used reasoning models (gemini 2.5 pro and opus 4 with extended thinking enabled).
12. smt88 ◴[] No.44493295[source]
I love reading about cooking but I'd hate to become a cook
13. saeedesmaili ◴[] No.44493330[source]
Author mentions that by doing that they didn't get a high quality response. Adding the texts into model's context make all the information available for it to use.
14. formerphotoj ◴[] No.44493877{3}[source]
Exactly this. The need to make money from a thing may well eliminate the value one derives from the thing, and even add negatives such as stress, etc.
replies(1): >>44512461 #
15. datpuz ◴[] No.44495207[source]
Reading 80k tokens requires more than 80k tokens due to overhead
16. cortesoft ◴[] No.44495336{3}[source]
I believed this, which is what made me avoid computer science in college; I wanted to avoid ruining my favorite hobby.

After a few years post graduation, where I wasn't sure what I wanted to do and I floundered to find a career, I decided to give software development a try, and risk ruining my favorite hobby.

Definitely the best decision I could have made. Now people pay me a lot of money to do the thing I love to do the most... what's not to love? 20 years later, it I still my favorite hobby, and they keep paying me to do it.

replies(4): >>44495559 #>>44495618 #>>44495730 #>>44512595 #
17. p1necone ◴[] No.44495559{4}[source]
I think it heavily depends on who you're working for.

If they get out of the way and let you do the thing you love how you want to do it you'll get good results for you and them.

If they treat you like a cog in a machine and assume they need to carrot and stick you into doing things because you might not really want to be there, you'll be miserable.

replies(1): >>44503509 #
18. sea-gold ◴[] No.44495561{3}[source]
https://english.stackexchange.com/questions/25225/ways-to-ru...
19. justusthane ◴[] No.44495618{4}[source]
Sure, of course. Sometimes it works out to follow your passion into a career. I was objecting to the apparent premise that that’s _always_ what you should do.
20. 8n4vidtmkvmk ◴[] No.44495730{4}[source]
My first software job I enjoyed. My 2nd/current job I enjoy everything except the actual work. Too much beuracracy, but it hasn't ruined my love for the craft yet. Oh well, I'm building some other skills I didn't know I had in me.
21. abrookewood ◴[] No.44497804{3}[source]
100%. I am absolutely certain that I do not have a viable career as a professional surfer ... no matter how much I wish it wasn't true.
replies(1): >>44512579 #
22. viraptor ◴[] No.44498307[source]
It is available and it can be surprisingly large. I've somehow accumulated multiple GB of data from YT alone. Which feels a bit absurd - there's bound to be lots of waste there.
23. yubblegum ◴[] No.44499423[source]
This can give a false sense of what Google (Alphabet) actually knows about you. That above is Google playing the game of 'ok, here is what we know of your activities on youtube when logged in!'

But Google and the rest of the "advertising" (euphemism for surveillance) industry track and create "profiles" based on a basket of data points, from ip/MAC address to the rest of their bag of tricks.

replies(1): >>44500827 #
24. alexnorton ◴[] No.44499820[source]
I was able to give this a try on every YouTube video I've ever watched by exporting the history from Google Takeout:

https://takeout.google.com/settings/takeout/custom/youtube?p...

And then a combination of pup and jq to parse the video titles from the HTML file:

  cat watch-history.html \
    | pup '.outer-cell .mdl-grid .content-cell:nth-child(2) json{}' \
    | jq -r '.[] .children[0] | select(.tag != "br") | select(.text | startswith("https://www.youtube.com/watch?v=") | not) | .text' \
    > videos.txt
25. dietr1ch ◴[] No.44500827{3}[source]
Internally at Google a toy tool to peek into your own personal advertisement profile was released and taken down within a week or two because it was creepy knowledgeable about you.
replies(1): >>44503885 #
26. UrineSqueegee ◴[] No.44501925[source]
o3 on the webui has a tiny context as do all the models
27. cortesoft ◴[] No.44503509{5}[source]
I have worked a few places at many different positions over an 18 year career so far.

I have enjoyed the programming part of all the jobs. I don’t really care the problem, I just like using computers to solve problems.

28. ariwilson ◴[] No.44503885{4}[source]
when?
replies(1): >>44510840 #
29. dietr1ch ◴[] No.44510840{5}[source]
Probably sometime around 2018 or 2019, I don't recall, but it was before the covid lockdown
30. tehlike ◴[] No.44512461{4}[source]
Not really. I do software both as a hobby, and as a career.
31. tehlike ◴[] No.44512579{4}[source]
Eh. Software engineers are in demand, and surfers decidedly are not.
32. tehlike ◴[] No.44512595{4}[source]
It was my hobby. Then I did computer science, and now I'm at a faang, make more money in a year than my parents in their lifetime probably.