Most active commenters
  • antonok(4)

←back to thread

684 points prettyblocks | 31 comments | | HN request time: 0.858s | source | bottom

I mean anything in the 0.5B-3B range that's available on Ollama (for example). Have you built any cool tooling that uses these models as part of your work flow?
1. antonok ◴[] No.42786841[source]
I've been using Llama models to identify cookie notices on websites, for the purpose of adding filter rules to block them in EasyList Cookie. Otherwise, this is normally done by, essentially, manual volunteer reporting.

Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and then you can grab their `innerText` and filter out false positives with a small LLM. I've found the 3B models have decent performance on this task, given enough prompt engineering. They do fall apart slightly around edge cases like less common languages or combined cookie notice + age restriction banners. 7B has a negligible false-positive rate without much extra cost. Either way these things are really fast and it's amazing to see reports streaming in during a crawl with no human effort required.

Code is at https://github.com/brave/cookiemonster. You can see the prompt at https://github.com/brave/cookiemonster/blob/main/src/text-cl....

replies(4): >>42786891 #>>42786896 #>>42793119 #>>42793157 #
2. binarysneaker ◴[] No.42786891[source]
Maybe it could also send automated petitions to the EU to undo cookie consent legislation, and reverse some of the enshitification.
replies(3): >>42786953 #>>42787244 #>>42788894 #
3. bazmattaz ◴[] No.42786896[source]
This is so cool thanks for sharing. I can imagine it’s not technically possible (yet?) but it would be cool if this could simply be run as a browser extension rather than running a docker container
replies(3): >>42786919 #>>42788804 #>>42789894 #
4. antonok ◴[] No.42786919[source]
I did actually make a rough proof-of-concept of this! One of my long-term visions is to have it running natively in-browser, and able to automatically fix site issues caused by adblocking whenever they happen.

The PoC is a bit outdated but it's here: https://github.com/brave/cookiemonster/tree/webext

5. antonok ◴[] No.42786953[source]
Ha, I'm not sure the EU is prepared to handle the deluge of petitions that would ensue.

On a more serious note, this must be the first time we can quantitatively measure the impact of cookie consent legislation across the web, so maybe there's something to be explored there.

replies(1): >>42790710 #
6. K0balt ◴[] No.42787244[source]
I think there is real potential here, for smart browsing. Have the llm get the page, replace all the ads with kittens, find non-paywall versions if possible and needed, spoof fingerprint data, detect and highlight AI generated drivel, etc. The site would have no way of knowing that it wasn’t touching eyeballs. We might be able to rake back a bit of the web this way.
replies(1): >>42787340 #
7. antonok ◴[] No.42787340{3}[source]
You probably wouldn't want to run this in real-time on every site as it'll significantly increase the load on your browser, but as long as it's possible to generate adblock filter rules, the fixes can scale to a pretty large audience.
replies(2): >>42788192 #>>42794640 #
8. K0balt ◴[] No.42788192{4}[source]
I was thinking running it in my home lab server as a proxy, but yeah, scaling it to the browser would require some pretty strong hardware. Still, maybe in a couple of years it could be mainstream.
9. throwup238 ◴[] No.42788804[source]
It should be possible using native messaging [1] which can call out to an external binary. The 1password extensions use that to communicate with the password manager binary.

[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...

10. sebastiennight ◴[] No.42788894[source]
To me this take is like smokers complaining that the evil government is forcing the good tobacco companies to degrade the experience by adding pictures of cancer patients on cigarette packs.
replies(1): >>42790227 #
11. MarioMan ◴[] No.42789894[source]
There are a couple of WebGPU LLM platforms available that form the building blocks to accomplish this right from the browser, especially since the models are so small.

https://github.com/mlc-ai/web-llm

https://huggingface.co/docs/transformers.js/en/index

You do have to worry about WebGPU compatibility in browsers though.

https://caniuse.com/webgpu

12. kortilla ◴[] No.42790227{3}[source]
Those don’t really work: https://jamanetwork.com/journals/jamanetworkopen/fullarticle...
replies(1): >>42792167 #
13. pk-protect-ai ◴[] No.42790710{3}[source]
why don't you spam the companies who want your data instead? The sites can simply stop gathering your data, then they will not require to ask for consent ...
replies(2): >>42791064 #>>42791197 #
14. frail_figure ◴[] No.42791064{4}[source]
It’s the same comments on HN as always. They think EU setting up rules is somehow worse than companies breaking them. We see how the US is turning out without pesky EU restrictions :)
replies(1): >>42793142 #
15. whywhywhywhy ◴[] No.42791197{4}[source]
Because they have no reason to care about what you think or feel or they wouldn't be doing it in the first place.

Cookie notices just gave them another weapon in the end.

16. shiftingleft ◴[] No.42792167{4}[source]
Do they help deter people from becoming smokers in the first place?
replies(1): >>42800812 #
17. GardenLetter27 ◴[] No.42793119[source]
It's funny that this is even necessary though - that great EU innovation at work.
replies(3): >>42794055 #>>42795154 #>>42796348 #
18. GardenLetter27 ◴[] No.42793142{5}[source]
The US has 3x higher salaries, larger houses and a much higher quality of life?

I work as a senior engineer in Europe and make barely $4k net per month... and that's considered a "good" salary!

replies(2): >>42793619 #>>42803540 #
19. rpastuszak ◴[] No.42793157[source]
Tangentially related, I worked on something similar, using LLMs to find and skip sponsored content in YT videos:

https://butter.sonnet.io/

20. Lutger ◴[] No.42793619{6}[source]
It has higher salaries for privileged people like senior engineers. Try making ends meet in a lower class job.

And you have (almost) free and universal healthcare in Europa, good food available everywhere, drinking water that doesn't poison you, walkable cities, good public transport, somewhat decent police and a functioning legal system. The list goes on. Does this not impact your quality of life? Do you not care about these things?

How can you have a higher quality of life as a society with higher murders, much lower life-expectancy, so many people in jail, in debt, etc.

replies(1): >>42793866 #
21. macinjosh ◴[] No.42793866{7}[source]
Touch grass. The US is a big place and is nothing like you seem to think it is.

Europe on the other hand can't even manage to defend itself and relies on the US for their sheer existence.

replies(2): >>42794951 #>>42803533 #
22. kalaksi ◴[] No.42794055[source]
Tracking, tracking cookies, banners etc. are a choice done by the website. There are browser addons for making it simpler, though.

The transparency requirements and consent for collecting all kinds of PII (this is the regulation) actually is a great innovation.

replies(1): >>42794440 #
23. docmars ◴[] No.42794440{3}[source]
I think I'd rather see cookie notices handled by a browser API with a common UI, where the default is always "No." Provide that common UI in a popover accessed in the address bar, or a side pane in the browser itself.

If a user logs in or does something requiring cookies that would otherwise prevent normal functionality, prompt them with a Permissions box if they haven't already accepted it in the usual (optional) UI.

replies(2): >>42794593 #>>42797274 #
24. kalaksi ◴[] No.42794593{4}[source]
Cookies for normal functionality don't require consent anyway.

But yes, I think just about everybody would like the UX you described. But the entities that track you don't want to make it that easy. You probably know of the do-not-track header too.

25. Tepix ◴[] No.42794640{4}[source]
Depends on your machine and on the LLM. Could be doable.
26. pona-a ◴[] No.42794951{8}[source]
Can you enlighten me of a state where none of parent's points apply? I'd be glad to be educated.
27. pornel ◴[] No.42795154[source]
The legislation has been watered down by lobbying of the trillion-dollar tracking industry.

The industry knows ~nobody wants to be tracked, so they don't want to let tracking preferences to be easy to express. They want cookie notices to be annoying to make people associate privacy with a bureaucratic nonsense, and stop demanding to have privacy.

There was P3P spec in 2002: https://www.w3.org/TR/P3P/

It even got decent implementation in Internet Explorer, but Google has been deliberately sending a junk P3P header to bypass it.

It has been tried again with a very simple DNT spec. Support for it (that barely existed anyway) collapsed after Microsoft decided to make Do-Not-Track on by default in Edge.

28. vvillena ◴[] No.42796348[source]
Bear in mind, those arcane cookie forms are probably not compliant with EU laws. If there's not a "reject" button next to the "accept" button, the form is almost definitely not to spec.
29. YetAnotherNick ◴[] No.42797274{4}[source]
There isn't any way EU didn't knew this was possible and is a better choice. There already was DNT header that they can regulate. It also knew the harm to ad industry.
replies(1): >>42797544 #
30. Fraaaank ◴[] No.42797544{5}[source]
There isn't any rule that requires websites to use a cookie banner. Your required to obtain explicit consent before reading/setting any cookies that aren't strictly necessary. The web came up with the cookie banner.

Google could've implemented a consent API in Chrome, but they didn't. Guess why.

31. kortilla ◴[] No.42800812{5}[source]
Not sure if much serious research has been put into it. I would be suspicious of it deterring them because a lot of initial smoking happens in social situations where friends pass out individual cigarettes.

By the time someone buys their own pack they are probably hooked.

I suspect the obscene taxes blocking out young folks is one of the most effective strategies