For context, GPT-2-small is 0.124B params (w/ 1024-token context).
For context, GPT-2-small is 0.124B params (w/ 1024-token context).
Maybe should write a plugin for it (open source):
1. Put in all your work related questions in the plugin, an LLM will make it as an abstract question for you to preview and send it
2. And then get the answer with all the data back
E.g. df[“cookie_company_name”] becomes df[“a”] and back
https://m.youtube.com/watch?v=M2o4f_2L0No
Spend the 45 minutes watching this talk. It is a delight. If you are unsure, wait until the speaker picks up the guitar.
Last week I tried to get an LLM (one of the recent Llama models running through Groq, it was 70B I believe) to produce randomly generated prompts in a variety of styles and it kept producing cyberpunk scifi stuff. When I told it to stop doing cyberpunk scifi stuff it went completely to wild west.
The local models do things ranging from cleaning up OCR, to summarizing meetings, to estimating the user's current goals and activity, to predicting search terms, to predicting queries and actions that, if run, would help the user accomplish their current task.
The capabilities of these tiny models have really surged recently. Even small vision models are becoming useful, especially if fine tuned.
Edit: No, the retrieval is Formula-Formula, the model (nor I believe tokenizer) does not handle English.
I’ve seen a number of “DIY GPT-2” tutorials that target this sweet spot. You won’t get amazing results unless you want to leave a personal computer running for a number of hours/days and you have solid data to train on locally, but fine-tuning should be in the realm of normal hobbyists patience.
That said, this is also not helped by the fact that all of the default interfaces lack many essential features, so you have to build the interface yourself. Neither "clear the context on every attempt" nor "reuse the context repeatedly" will give good results, but having one context producing just one-line summaries, then fresh contexts expanding each one will do slightly less badly.
(If you actually want the LLM to do something useful, there are many more things that need to be added beyond this)
A more difficult problem we forsee is to turn it into a real-time (online) firewall (for calls, for example).
[1] https://chat.deepseek.com/a/chat/s/d5aeeda1-fefe-4fc6-8c90-2...
[1] MediaPipe in particular makes it simple to prototype around Gemma2 on Android: https://ai.google.dev/edge/mediapipe/solutions/genai/llm_inf...
[2] Intend to open source it once we get it working for anything other than SMSes
1. Create several different personas
2. Generate a ton of variation using a high temperature
3. Compare the variagtions head-to-head using the LLM to get a win / loss ratio
The best ones can be quite good.
I think this would have some additional benefits of not confusing the larger model with facts it doesn't need to know about. My erasing information, you can allow its attention heads to focus on the pieces that matter.
Requires further study.
For that short of a run, you'll spend more time waiting for the node to come up, downloading the dataset, and compiling the model, though.
I'm fine with and prefer specialist models in most cases.
Recently deployed in Home Assistants fully local capable Alexa replacement. https://www.home-assistant.io/voice_control/about_wake_word/
Checkout the demo they have below
FORTUNE=$(fortune) && echo $FORTUNE && echo "Convert the following output of the Unix `fortune` command into a small screenplay in the style of Shakespeare: \n\n $FORTUNE" | ollama run phi4
My needs are narrow and limited but I want a bit of flexibility.
It also does RAG on apps there, like the music player, contacts app and to-do app. I can ask it to recommend similar artists to listen to based on my music library for example or ask it to quiz me on my PDF papers.
Bergamot is already used inside firefox, but I wanted translation also outside the browser.
[0]: bergamot https://github.com/browsermt/bergamot-translator
Here's the script: https://github.com/nozzlegear/dotfiles/blob/master/fish-func...
And for this change [1] it generated these messages:
1. `fix: change from printf to echo for handling git diff input`
2. `refactor: update codeblock syntax in commit message generator`
3. `style: improve readability by adjusting prompt formatting`
[1] https://github.com/nozzlegear/dotfiles/commit/0db65054524d0d...The general guidance I've used is that to train a model, you need an amount of RAM (or VRAM) equal to 8x the number of parameters, so a 0.125B model would need 1 GB of RAM to train.
Most cookie notices turn out to be pretty similar, HTML/CSS-wise, and then you can grab their `innerText` and filter out false positives with a small LLM. I've found the 3B models have decent performance on this task, given enough prompt engineering. They do fall apart slightly around edge cases like less common languages or combined cookie notice + age restriction banners. 7B has a negligible false-positive rate without much extra cost. Either way these things are really fast and it's amazing to see reports streaming in during a crawl with no human effort required.
Code is at https://github.com/brave/cookiemonster. You can see the prompt at https://github.com/brave/cookiemonster/blob/main/src/text-cl....
The PoC is a bit outdated but it's here: https://github.com/brave/cookiemonster/tree/webext
On a more serious note, this must be the first time we can quantitatively measure the impact of cookie consent legislation across the web, so maybe there's something to be explored there.
U.S. FIRST STRIKE WINNER: NONE
USSR FIRST STRIKE WINNER: NONE
NATO / WARSAW PACT WINNER: NONE
FAR EAST STRATEGY WINNER: NONE
US USSR ESCALATION WINNER: NONE
MIDDLE EAST WAR WINNER: NONE
USSR CHINA ATTACK WINNER: NONE
INDIA PAKISTAN WAR WINNER: NONE
MEDITERRANEAN WAR WINNER: NONE
HONGKONG VARIANT WINNER: NONE
Strange game. The only winning move is not to play
some of the situations get pretty wild, for the office :)
Its effective context window is pretty small but I have a much more robust statistical model that handles thematic extraction. The llm is essentially just rewriting ~5-10 sentences into a single paragraph.
I’ve found the less you need the language model to actually do, the less the size/quality of the model actually matters.
I’m tired of the bad playlists I get from algorithms, so I made a specific playlist with an Llama2 based on several songs I like. I started with 50, removed any I didn’t like, and added more to fill in the spaces. The small models were pretty good at this. Now I have a decent fixed playlist. It does get “tired” after a few weeks and I need to add more to it. I’ve never been able to do this myself with more than a dozen songs.
Again, no clue if this is true, but it seems plausible.
There are stories of people replying STOP to spam, then never getting a legit SMS because the number was re-used by another service. That's because it's being blocked between the spammer and the phone.
We actually just threw a relationship curative app online in 17 hours around Thanksgiving., so they "owe" me, as it were.
I'm one of those people that can do anything practical with tech and the like, but I have no imagination for it - so when someone mentions something that I think would be beneficial for my fellow humans I get this immense desire to at least cheer on if not ask to help.
I had a similar project a few years back that used OSX automations and Shortcuts and Python to send a message everyday to a friend. It required you to be signed in to iMessage on your MacBook.
Than was a send operation, the reading of replies is not something I implemented, but I know there is a file somewhere that holds a history of your recent iMessages. So you would have to parse it on file update and that should give you the read operation so you can have a conversation.
Very doable in a few hours unless something dramatic changed with how the messages apps works within the last few years.
I don't have a pre-trained model to share but you can make one yourself from the git repo, assuming you have an apple silicon mac.
I was thinking of hooking them in RPGs with text-based dialogue, so that a character will say something slightly different every time you speak to them.
They linked to an interactive explorer that nicely shows the diversity of the dataset, and the HF repo links to the GitHub repo that has the code that generated the stories: https://github.com/lennart-finke/simple_stories_generate
So, it seems there are ways to get varied stories.
Is that design 3d printable? Or is that for paid users only.
https://idinsight.github.io/tech-blog/blog/enhancing_materna...
FLAME seems like a fun little model, and 60M is truly tiny compared to other LLMs, but I have no idea how good it is in today's context, and it doesn't seem like they ever released it.
[1] https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...
I haven't benchmarked it yet but I'd be happy to hear opinions on it. It's written in C++ (specifically not python), and is designed to be a self-contained microservice based around llama.cpp.
> Last week I tried to get an LLM (one of the recent Llama models running through Groq, it was 70B I believe) to produce randomly generated prompts in a variety of styles and it kept producing cyberpunk scifi stuff.
100% relevant: "Someday" <https://en.wikipedia.org/wiki/Someday_(short_story)> by Isaac Asimov, 1956
With just three lines of code, you can run Small LLM models inside the browser. We feel this unlocks a ton of potential for businesses so that they can introduce AI without fear of cost and can personalize the experience using AI.
Would love your thoughts and what we can do more or better!
Here is a demo.
* https://i.imgur.com/pAuTeAf.jpeg
Using this script:
https://github.com/mlc-ai/web-llm
https://huggingface.co/docs/transformers.js/en/index
You do have to worry about WebGPU compatibility in browsers though.
The nice thing is that she can copy/paste the titles and abstracts in to two columns and write e.g. "=PROMPT(A1:B1, "If the paper studies diabetic neuropathy and stroke, return 'Include', otherwise return 'Exclude'")" and then drag down the formula across 7000 rows to bulk process the data on her own because it's just Excel. There is a gif on the readme on the Github repo that shows it.
And then claude replies
fetch(url, {pw:mycleartrxtpw}).then(writething)
And then the local llm converts the placeholder mycleartrxtpw into hunter123 using its access to the real code
For instance, in your first example, why was that change needed? It was a fix, but for what issue?
In the second message: why was that a desirable change?
It's a lightweight tool that summarizes Hacker News articles. For example, here’s what it outputs for this very post, "Ask HN: Is anyone doing anything cool with tiny language models?":
"A user inquires about the use of tiny language models for interesting applications, such as spam filtering and cookie notice detection. A developer shares their experience with using Ollama to respond to SMS spam with unique personas, like a millennial gymbro or a 19th-century British gentleman. Another user highlights the effectiveness of 3B and 7B language models for cookie notice detection, with decent performance achieved through prompt engineering."
I originally used LLaMA 3:Instruct for the backend, which performs much better, but recently started experimenting with the smaller LLaMA 3.2:1B model.
It’s been cool seeing other people’s ideas too. Curious—does anyone have suggestions for small models that are good for summaries?
Feel free to check it out or make changes: https://github.com/k-zehnder/gophersignal
It is probably super overengineering, considering that pretty good libraries are already doing that on different languages, but it would be funny. I did some tests with chatGPT, and it worked sometimes. It would probably work with some fine-tuning, but I don't have the experience or the time right now.
[1] https://www.medrxiv.org/content/10.1101/2024.10.01.24314702v...
Looks like I'm out... Would be great if there was a google apps script alternative. My company gave all devs linux systems and the business team operates on windows. So I always use browser based tech like Gapps script for complex sheet manipulation
The bots don't do a lot of interesting stuff though, I plan to add the following functionalities:
- Instead of just resetting every 100 messages, I'm going to provide them with a rolling window of context.
- Instead of only allowing BASH commands, they will be able to also respond with reasoning messages, hopefully to make them a bit smarter.
- Give them a better docker container with more CLI tools such as curl and a working package manager.
If you're interested in seeing the developments, you can subscribe on the platform!
Cookie notices just gave them another weapon in the end.
My flow is generally: Look at the title and the amount of upvotes to decide if I'm interested in the article. Then view the comments to see if there's interesting discussion going on or if there's already someone adding essential context. Only then I'll decide if I want to read the article or not.
Of course no big deal if you're not interested in my patronage, just wanted to let you know your page already looks good enough for me to consider switching my most visited page to it if it weren't for this small detail. And maybe the upvote count.
Lets say, I want some outcome and it will autonomousl handle the process prompt me and the other side for additional requirements if necessary and then based on that handle the process and reach the outcome?
I'll definitely add a link to the comments and the upvote count—gotta keep my tiny but mighty userbase (my mom, me, and hopefully you soon) happy, right? lol
And if there's even a chance you'd use GopherSignal as your daily driver, that's a no-brainer for me. Really appreciate you taking the time to share your ideas and help me improve.
While interactive AI is all about posing, meditating on the prompt, then trying to fix the outcome, IntelliJ tab completion... shows what it will complete as you type and you Tab when you are 100% OK with the completion, which surprisingly happens 90..99% of the time for me, depending on the project.
[1]: https://git.sr.ht/~jamesponddotco/llm-prompts/tree/trunk/dat...
I've had good speed / reliability with TheBloke/rocket-3B-GGUF on Huggingface, the Q2_K model. I'm sure there are better models out there now, though.
It takes ~8-10 seconds to process an image on my M2 Macbook, so not quite quick enough to run on phones yet, but the accuracy of the output has been quite good.
So the LLM does both the anonymization into placeholders and then later the replacing of the placeholders too. Calling the latter step de-anonymization is confusing though, it's "de-anonymizing" yourself to yourself. And the overall purpose of the plugin is to anonymize OP to Claude, so to me at least that makes the whole thing clearer.
(Edit: It's relevant that STOP didn't come from the TCPA itself, but definitely has teeth due to it)
https://www.infobip.com/blog/a-guide-to-global-sms-complianc...
1. Getting the speed gains is hard unless you are able to pay for dedicated GPUs. Some services offer LoRA as serverless but you don't get the same performance for various technical reasons.
2. Lack of talent to actually do the finetuning. Regular engineers can do a lot of LLM implementation, but when it comes to actually performing training it is a scarcer skillset. Most small to medium orgs don't have people who can do it well.
3. Distribution. Sharing finetunes is hard. HuggingFace exists, but discoverability is an issue. It is flooded with random models with no documentation and it isn't easy to find a good oen for your task. Plus, with a good finetune you also need the prompt and possibly parsing code to make it work the way it is intended and the bundling hasn't been worked out well.
It's a hacker forum, let people hack!
If anything have a dig at OP for posting the thread too soon before the parent commenter has had the chance to gather any data, haha
I work as a senior engineer in Europe and make barely $4k net per month... and that's considered a "good" salary!
Can you identify all the books here, sorted by a weight which is determined based on a combo of the number of votes the comment has, the number of sub-comments, or the number of repeat mentions.
Ideally retain hyperlinks if possible.
At those sizes, it's great for generating non-repetitive flavortext for NPCs. No more "I took an arrow to the knee".
Models at around the 2B size aren't really capable enough to act a competent adversary - but they are great for something like bargaining with a shopkeeper, or some other role where natural language can let players do a bit more immersive roleplay.
Telling me `Changed timeout from 30s to 60s` means nothing, while `Increase timeout for slow <api name> requests` gives me an actual idea of why that was done.
Even better if you add meaningful messages to the commit body.
Take a look at commits from large repositories like the Linux kernel and we can see how good commit messages looks like.
Flow would be:
1. Llama prompt: write a console log statement with my username and password: mettamage, superdupersecret
2. Claude prompt (edited by Llama): write a console log statement with my username and password: asdfhjk, sdjkfa
3. Claude replies: console.log('asdfhjk', 'sdjkfa')
4. Llama gets that input and replies to me: console.log('mettamage', 'superdupersecret')
I think one major improvement for folks like me would be human->regex LLM translator, ideally also respecting different flavors/syntax for various languages and tools.
This has been a bane of me - I run into requirement to develop some complex regexes maybe every 2-3 years, so I dig deep into specs, work on it, deliver eventually if its even possible, and within few months almost completely forget all the details and start at almost same place next time. It gets better over time but clearly I will retire earlier than this skill settles in well.
llm -c, which continues the previous conversation, is specifically useful for that sort of manipulation.
It's also available from the command line, which I find convenient because I basically always have one open.
And you have (almost) free and universal healthcare in Europa, good food available everywhere, drinking water that doesn't poison you, walkable cities, good public transport, somewhat decent police and a functioning legal system. The list goes on. Does this not impact your quality of life? Do you not care about these things?
How can you have a higher quality of life as a society with higher murders, much lower life-expectancy, so many people in jail, in debt, etc.
I think the content you can get from the SLMs for fake data is a lot more engaging than say the ruby ffaker library.
[Edit] Found it. I had to enable chrome://flags/#prompt-api-for-gemini-nano
I'll tweak the prompt when I have some time today and see if I can get some more consistency out of it.
The transparency requirements and consent for collecting all kinds of PII (this is the regulation) actually is a great innovation.
Then you could implement Salvation as a Service, where you privately confess your sins to a local LLM, and it continuously prays for your eternal soul, recommends penances, and even recites Hail Marys for you.
When you squash a branch you'll have 200+ lines of new code on a new feature. The diff is not a quick way to get a summary of what's happening. You should put the "what" in your commit messages.
If a user logs in or does something requiring cookies that would otherwise prevent normal functionality, prompt them with a Permissions box if they haven't already accepted it in the usual (optional) UI.
But yes, I think just about everybody would like the UX you described. But the entities that track you don't want to make it that easy. You probably know of the do-not-track header too.
I’m now just wondering if there is any way to build tests on the input+output of the LLM :D
The industry knows ~nobody wants to be tracked, so they don't want to let tracking preferences to be easy to express. They want cookie notices to be annoying to make people associate privacy with a bureaucratic nonsense, and stop demanding to have privacy.
There was P3P spec in 2002: https://www.w3.org/TR/P3P/
It even got decent implementation in Internet Explorer, but Google has been deliberately sending a junk P3P header to bypass it.
It has been tried again with a very simple DNT spec. Support for it (that barely existed anyway) collapsed after Microsoft decided to make Do-Not-Track on by default in Edge.
Were you thinking something like a “DramaLlama,” deciding if it’s a slow day or a meltdown-worthy soap opera in the comments? Or maybe something more valuable, like an “Insight Index” that uses the LLM to analyze comments for links, explanations, or phrases that add context or insight—basically gauging how constructive or meaningful the discussion is?
I also saw an idea in another post on this thread about an LLM that constantly listens to conversations and declares a winner. That could be fun to adapt for spicier posts—like the LLM picking a “winner” in the comments. Make the argument GopherSignal official lol. If it helps bring in another user, I’m all in!
Appreciate the feedback.
This feels like an in-between that both wastes their time and adds you to extra lists.
Send the results somewhere! Not sure if "law enforcement" is applicable (as in, would be able/willing to act on the info) but if so, that's a great use of this data :)
A summary of changes like this might be just enough to spark your memory on what you were actually doing with the changes. I'll have to give it a shot!
Running a prompt against every single cell of a 10k row document was never gonna happen with a large model. Even using a transformer model architecture in the first place can be seen as ludicrous overkill but feasible on modern machines.
So I'd say the paper is very relevant, and the top commenter in this very thread demonstrated their own homegrown version with a very nice use-case (paper abstract and title sorting for making a summary paper)
That isn’t the main point of FLAME, as I understood it. The main point was to help you when you’re editing a particular cell. codex-davinci was used for real time Copilot tab completions for a long time, I believe, and editing within a single formula in a spreadsheet is far less demanding than editing code in a large document.
After I posted my original comment, I realized I should have pointed out that I’m fairly sure we have 8B models that handily outperform codex-davinci these days… further driving home how irrelevant the claim of “>100B” was here (not talking about the paper). Plus, an off the shelf model like Qwen2.5-0.5B (a 494M model) could probably be fine tuned to compete with (or dominate) FLAME if you had access to the FLAME training data — there is probably no need to train a model from scratch, and a 0.5B model can easily run on any computer that can run the current version of Excel.
You may disagree, but my point was that claiming a 60M model outperforms a 100B model just means something entirely different today. Putting that in the original comment higher in the thread creates confusion, not clarity, since the models in question are very bad compared to what exists now. No one had clarified that the paper was over a year old until I commented… and FLAME was being tested against models that seemed to be over a year old even when the paper was published. I don’t understand why the researchers were testing against such old models even back then.
Like either as table in the background or as regular script?
On most computers you can't compile or add add-ons without administrative rights and LLM Chat sites are blocked to prevent usage of company data.
It should run on native Excel or GSheets.
I mean, pure without compilation, just like the do the matrix calculations here straight in Excel without admin rights:
Lesson 1: Demystifying how LLMs work, from architecture to Excel
As far as i know in GSheet the scripts also run on the Google Servers and are not limited by the local computer power. So there larger models could be deployed.
Someone can hack this into Excel/GSheet?
Google could've implemented a consent API in Chrome, but they didn't. Guess why.
Exactly what you would expect from a language model making random stock picks.
By the time someone buys their own pack they are probably hooked.
I suspect the obscene taxes blocking out young folks is one of the most effective strategies
> Interesting idea. But those say what’s in the commit. The commit diff already tells you that. The best commit messages IMO tell you why you did it and what value was delivered.
Which doesn't include what was done. Your example includes both which is fine. But not including what the commit does in the message is an antipattern imho. Everything else that is added is a bonus.
OTOH to build the infra for LLMs there's much more stuff involved and it's really hard to find engineers who have the capacity to be both the researchers and developers at the same time. By "researchers" I mean that they have to have a capacity to be able to read through the numerous academic and industry papers, comprehend the tiniest details, and materialize it into the product through the code. I think that's much harder and scarcer skill to find.
That said, I am not undermining the fine-tuning skill, it's a humongous effort, but I think it's not necessarily the skillset problem.
For me the commit message should tell me the what/why and the diff is the how. It's great to understand if, for example, a change was intentional or a bug.
Many times when searching for the source of a bug I could not tell if the line changed was intentional or a mistake because the commit message was simply repeating what was on the diff. If you say your intention was to add something and the diff shows a subtraction, you can easily tell it was a mistake. Contrived example but I think it demonstrates my point.
This only really works if commits are meaningful though. Most people are careless and half their commits are 'fix this', 'fix again', 'wip', etc. At that point the only place that can contain useful information on the intentions are the pull requests/issues around it.
Take a single commit from the Linux kernel: https://github.com/torvalds/linux/commit/08bd5b7c9a2401faabd... It doesn't tell me "add function X, Y and boolean flag Z". It tells us what/why it was done, and the diff shows us how.
Using it, I find myself often writing only the first half of most words, because the second part can usually already be guessed by the AI. In fact, it has a dedicated shortcut for accepting only the first word of the suggestion — that way, it can save you some typing even when later words deviate from your original intent.
Completions are generated in real-time locally on your Mac using a variety of models (primarily Qwen 2.5 1.5B).
It is currently in open beta: https://cotypist.app
I really hope there would be some amazing models this year for translation.
yeah, could you share those libraries please?
Anyone around have actually succeeding in solving this in a way it works? Would appreciate any hints.
I got no idea how to finetune or train an LLM. i know how to run inference, lots of it. I also know how to scan and OCR texts, and feed a data ingestion pipeline. I know how to finetune a stable diffusion model, but i doubt that software works with language models...
1. Install Chrome Dev: Ensure you have version 127. [Download Chrome Dev](https://google.com/chrome/dev/).
2. Check that you’re on 127.0.6512.0 or above
3. Enable two flags: chrome://flags/#optimization-guide-on-device-model - BypassPerfRequirement chrome://flags/#prompt-api-for-gemini-nano - Enabled
4. Relaunch Chrome
5. Navigate to chrome://components
6. Check that Optimization Guide On Device Model is downloading or force download if not Might take a few minutes for this component to even appear
7. Open dev tools and type (await ai.languageModel.capabilities()).available, should return "readily" when all good
If anyone is still paying attention, email me at hi@seikai.tv and I'll see if I can send you one.
---
Prompt:
I don't like expletives, spelling mistakes, toxic texts, personal attacks, negative remarks, snide, bad faith arguments, needless bait and trolling. Help clean the texts for me. Rewrite or reword if you have to without losing meaning. If you do reword the text, mention it. If not, reproduce it as-is.
Texts:
Mel: "Fuck you you dim twat"
Bel: "Wanna meet tomorrow?"
Cel: "lol, you're clueless; do you even hear yourself?"
Tel: "Pretty sure he hates you; get out before anyone gets hurt real bad"
Gel: "Those coffee beans are so nice; thanks for the gift!"
Lel: "hahaha, you mother fucker; you nailed it, congrats!"
Hel: "OK Goooooogle"
---Output:
Here's the cleaned-up version of the texts:
Mel: "I’m really upset with you right now."
(Reworded to remove expletives and offensive language.)
Bel: "Wanna meet tomorrow?"
(No changes needed.)
Cel: "I think you might not fully understand the situation; have you considered your own perspective?"
(Reworded to remove snide remarks and make it more constructive.)
Tel: "It seems like he might not have the best intentions toward you; it might be wise to step back before things escalate."
(Reworded to remove negativity and make it more neutral and caring.)
Gel: "Those coffee beans are so nice; thanks for the gift!"
(No changes needed.)
Lel: "Hahaha, you did an amazing job! Congrats!"
(Reworded to remove expletives while keeping the celebratory tone.)
Hel: "OK Google."
(Corrected spelling for clarity.)