Most active commenters
  • embedding-shape(5)
  • Wowfunhappy(3)

←back to thread

798 points bertman | 18 comments | | HN request time: 1.383s | source | bottom
Show context
embedding-shape ◴[] No.45900337[source]
Seems its already in Arch's repositories, and seems to work, just add another flag to the invocation:

    yt-dlp --cookies-from-browser firefox --remote-components ejs:github -f "bestvideo[ext=mp4]+bestaudio[ext=m4a]/best[ext=mp4]/best" 'https://www.youtube.com/watch?v=XXX'
It is downloading a solver at runtime, took maybe half a second in total, downloads are starting way faster than before it seems to me.

    [youtube] [jsc:deno] Solving JS challenges using deno
    [youtube] [jsc:deno] Downloading challenge solver lib script from  https://github.com/yt-dlp/ejs/releases/download/0.3.1/yt.solver.lib.min.js
It would be great if we could download the solver manually with a separate command, before running the download command, as I'm probably not alone in running yt-dlp in a restricted environment, and being able to package it up together with the solver before runtime would let me avoid lessening the restrictions for that environment. Not a huge issue though, happy in general the start of downloads seems much faster now.
replies(8): >>45900780 #>>45901047 #>>45901146 #>>45901292 #>>45902245 #>>45902494 #>>45903312 #>>45906436 #
1. Wowfunhappy ◴[] No.45901292[source]
What environment are you using that: - Has access to Youtube - Can run Python code - Can’t run JS code

If the concern is security, it sounds like the team went to great lengths to ensure the JS was sandboxed (as long as you’re using Deno).

If you’re using some sort of weird OS or architecture that Deno/Node doesn’t support, you might consider QuickJS, which is written in pure C and should work on anything. (Although it will be a lot slower, I’m not clear just how slow.) Admittedly, you then loose the sandboxing, although IMO it seems like it should safe to trust code being served by Google on the official Youtube domain. (You don’t have to trust Google in general to trust that they won’t serve you actual malware.)

replies(5): >>45902060 #>>45902074 #>>45902207 #>>45902465 #>>45907442 #
2. dspillett ◴[] No.45902060[source]
> What environment are you using that: - Has access to Youtube - Can run Python code - Can’t run JS code

They didn't say “can't run JS code”, but that from that location the solver could not be downloaded currently. It could be that it is an IPv6-only environment (IIRC youtube supports IPv6 but github does not), or just that all external sites must be assessed before whitelisted (I'm not sure why youtube would be but not github, but it is certainly possible).

replies(2): >>45902260 #>>45903515 #
3. ◴[] No.45902074[source]
4. embedding-shape ◴[] No.45902207[source]
> What environment are you using that: - Has access to Youtube - Can run Python code - Can’t run JS code

Nothing specific, just tend to run tools in restricted VMs where things are whitelisted and it's pretty much as locked down as it can be. It can run whatever I want it to run, including JS, and as the logs in my previous comment shows, it is in fact running both Python and JS, and has access to YouTube, otherwise it wouldn't have worked :)

I tend to have the rule of "least possible privileges" so most stuff I run like that has to be "prepped" basically, especially things that does network requests sometimes (updating the solver in this case), just a matter of packaging it before I run it, so it's not the end of the world.

No weird OS or architecture here, just good ol' Linux.

> IMO it seems like it should safe to trust code being served by Google on the official Youtube domain

The JS script being downloaded is from the yt-dlp GitHub organization (https://github.com/yt-dlp/ejs/releases/download/0.3.1/yt.sol...), not from Google or any websites, FWIW.

replies(2): >>45902813 #>>45904097 #
5. embedding-shape ◴[] No.45902260[source]
It's just me being paranoid after seeing npm/pypi supply chain attacks, and since then I basically run most software touching the internet in a VM one way or another.

I think in this case, my own laziness is what makes it worse than it has to, currently I'm doing whitelisting by domains, so youtube.com for the yt-dlp runner is obviously OK, and I'd want to avoid whitelisting github.com for that, since it's just downloading one JS file.

For now manually copying the config file into my SCM or just whitelisting GitHub for initial download does the trick. I guess I just had to squeeze in one complaint in my previous comment so I could get the HN stamp of approval, can't be too positive.

replies(2): >>45902825 #>>45902886 #
6. ivankra ◴[] No.45902465[source]
> Although it will be a lot slower, I’m not clear just how slow.

Around 30-50x slower than V8 (node/deno).

I've been recently benchmarking a lot of different engines: https://ivankra.github.io/javascript-zoo/

replies(1): >>45902624 #
7. ranger_danger ◴[] No.45902624[source]
> Around 30-50x slower than V8 (node/deno).

A solver running at 50ms instead of 1ms I would say is practically imperceptible to most users, but I don't know what time span you are measuring with those numbers.

replies(1): >>45903051 #
8. j45 ◴[] No.45902813[source]
This is the way. Leaving so many packages with unfettered access to your system is only so secure.
9. j45 ◴[] No.45902825{3}[source]
It's not paranoid, it's more attack surfaces that don't need to be.

Happy to read and learn more about the setups you've found helpful to do this.

10. dspillett ◴[] No.45902886{3}[source]
You could serve the files yourself from a server populated by updating them from github after review. You'd need to either sign the domain with your own CA that the host running yt-dlp trusts, or patch yt-dlp to use a different server name, but neither of those steps should be too onerous.
11. ivankra ◴[] No.45903051{3}[source]
My page is about generic JS benchmarks. Just did a quick run with a sample javascript challenge I got via yt-dlp (https://raw.githubusercontent.com/ivankra/javascript-zoo/ref...):

  $ time ./v8 /bench/yt-dlp.js | md5sum -
  a730e32029941bf1f60f9587a6d9554f  -
  real 0m0.252s
  user 0m0.386s
  sys 0m0.074s

  $ time ./quickjs /bench/yt-dlp.js | md5sum -
  a730e32029941bf1f60f9587a6d9554f  -
  real 0m2.280s
  user 0m2.507s
  sys 0m0.031s
So about 10x slower for the current flavor of YouTube challenges: 0.2s -> 2.2s.

A few more results on same input:

  spidermonkey 0.334s
  v8_jitless 1.096s => about the limit for JIT-less interpreters like quickjs
  graaljs 2.396s
  escargot 3.344s
  libjs 4.501s
  brimstone 6.328s
  modernc-quickjs 12.767s (pure Go port of quickjs)
  fastschema-qjs 1m22.801s (Wasm port of quickjs)
  boa 1m28.070s
  quickjs-ng 2m49.202s
replies(1): >>45904858 #
12. rwmj ◴[] No.45903515[source]
I've just hit the IPv6 problem. I routinely use yt-dlp -6 to cycle through my (basically infinite) set of IPv6 addresses. However when you do this, it tries the github EJS download over IPv6, which fails as github doesn't support IPv6 (because it's still the year 2000 over there).

Actually I think this is kind of a yt-dlp bug, since it doesn't need to use IPv6 for the github download.

13. Wowfunhappy ◴[] No.45904097[source]
> The JS script being downloaded is from the yt-dlp GitHub organization

I meant the challenge that is the reason they need the Javascript in the first place.

You can’t very well run yt-dlp without trusting yt-dlp code.

replies(1): >>45905265 #
14. rdtsc ◴[] No.45904858{4}[source]
Thanks for the benchmark!

I tried it on my slower laptop. I get:

   node(v8)  : 1.25s user 0.12s system 154% cpu 0.892 total
   quickjs   : 6.54s user 0.11s system 99% cpu 6.671 total
   quickjs-ng: 545.55s user 202.67s system 99% cpu 12:32.28 total
A 5x slowdown for an interpreted C JS engine is pretty good I think, compared to all the time, code and effort put into v8 over the years!
15. embedding-shape ◴[] No.45905265{3}[source]
The original point was this:

> > IMO it seems like it should safe to trust code being served by Google on the official Youtube domain

Which came from a misunderstanding about where the downloadable solver script comes from, as it doesn't come from youtube.com, it comes from github.com (yt-dlp org), I was just correcting that misunderstanding.

> You can’t very well run yt-dlp without trusting yt-dlp code.

That makes a ton of sense and I agree! I'm not sure how that is related to anything though? I download yt-dlp from Arch repositories, so yes I'm trusting Arch maintainers and of course yt-dlp developers. Then I'm adding a manifest which controls what this application can actually access, which is basically a VM config, where I define that it can access youtube.com (and a bunch of other sites I mirror/archive). This is the part that shouldn't have github.com/* access.

Again as mentioned, not a big issue, plenty of workarounds, so not the end of the world.

replies(1): >>45906859 #
16. Wowfunhappy ◴[] No.45906859{4}[source]
> Which came from a misunderstanding about where the downloadable solver script comes from, as it doesn't come from youtube.com, it comes from github.com (yt-dlp org), I was just correcting that misunderstanding.

But that script is ultimately running a JS challenge from Youtube, right? That’s why we actually needed a JS runtime in the first place.

replies(1): >>45907927 #
17. mike-cardwell ◴[] No.45907442[source]
This works for me:

    FROM python:3-slim
    RUN python3 -m pip --no-cache-dir install 'yt-dlp[default]'
    RUN apt-get update \
     && DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends -q -y ffmpeg curl unzip \
     && curl -fsSL https://deno.land/install.sh -o /tmp/deno.sh \
     && sh /tmp/deno.sh -y \
     && mv /root/.deno/bin/deno /usr/local/bin/ \
     && rm --force --recursive /var/lib/apt/lists/* /tmp/* /var/tmp/*
    ENTRYPOINT ["yt-dlp"]
18. embedding-shape ◴[] No.45907927{5}[source]
Correct, the data needed to solve the challenge comes from YouTube.