I started the project, "Joystick Jargon" combining traditional crossword elements with gaming-related vocabulary. Here's the technical process behind it:
1. Data Source: Used a 3.8 Million Rows Reddit dataset from Hugging Face (https://huggingface.co/datasets/webis/tldr-17).
2. Data Filtering: Narrowed down to gaming-related subreddits (r/gaming, r/dota2, r/leagueoflegends).
3. Keyword Extraction: Employed ML techniques, specifically BERT-embeddings and cosine similarity, to extract keywords from the subreddits.
4. Data Preprocessing: Cleaned up data unsuitable for crossword puzzles.
5. Grid Generation: Implemented a heuristic crossword algorithm to create grids and place words efficiently.
6. Clue Generation: Utilized a Large Language Model to generate context-aware clues for the placed words.
The resulting system creates crossword puzzles that blend traditional elements with gaming terminology, achieving about a 50-50 mix.
This project is admittedly overengineered for its purpose, but it was an interesting exploration into natural language processing, optimization algorithms, and the intersection of traditional word games with modern gaming culture.
A note on content: Since the data source is Reddit, some mature language may appear in the puzzles. Manual filtering was minimal to preserve authenticity.
You can try the puzzles here: <https://capsloq.de/crosswords/joystick-jargon>
I'm curious about the HN community's thoughts on this approach to puzzle generation? What other domains might benefit from similar computational techniques for content creation?
I think it might be worth working on prompting to make sure the answer is a unique solution to the hint (or at least closer to unique). What model are you using here?
And googling 'Emily video game character' didn't bring up any noticeably popular video game characters.
> Unportal
what?
>19: An online forum section where gamers come together to discuss specific topics
> ITT
I don't think that's right...
Most of the questions just seem like normal crossword questions, but with the term "in games" added to it.
I'm not gonna sugarcoat it: this sucks. The crossword grids often have totally isolated words. 1 across and 2 down start with the same letter. The questions are nonsensical. I'd hardly go so far as to call it an interesting proof of concept.
There's an outstanding issue and that is (from what I can tell) at least 75% of the answers correspond to relatively generic nouns or verbs.
Part of the deep satisfaction in solving a crossword puzzle is the specificity of the answer. It's far more gratifying to answer a question with something like "Hawking" then to answer with "scientist", or answering with "mandelbrot" versus "shape".
It might be worth going back and looking up a compendium of games released in the last couple decades, cross referencing them with their manuals, GameFaqs, etc. and peppering this information into the crossword.
Sadly, the clues and the words relating to them feel off, making the whole game rather unenjoyable.
Clickable link: https://capsloq.de/crosswords/joystick-jargon
I always think about doing something similar for a similar project. Are you able to do it completely automatically or do you have to help finesse the words to fit?
1. Begins with an empty grid and starts placing words horizontally from the top-left corner
2. For each word placement, it verifies that valid words can be formed vertically at each intersection point
3. It maintains a list of possible letters for each cell to ensure all constraints are satisfied
4. The generator consults a dictionary to find valid words that fit the curent grid state, allowing for diverse solutions
5. If no valid word can be placed, it may decide to insert a black square, carefully checking that doesn't violate any crossword rules
6. When it reaches an dead end, the system backtracks and tries different options
7. It employs smart heuristics to guide word selection, such as favoring longer words in certain positions
8. Throughout the process it automatically adjusts parameters like word length andblack square placement to find a valid solution
There is no manual intervetion, however the quality depends heavily on the input dictionary and tunable parameters.
Generated words and clues:
heroes: Characters with unique abilities in Dota 2, tasked with defeating the enemy's Ancient.
ragers: Players who overly react to in-game frustrations, often ruining the fun for everyone.
rage: A common emotion experienced by players sometimes leading to poor decision-making.
tachyons: Hypothetical particles that travel faster than light, having no place in an Ancient's mechanics.
healing: Essential support function often provided by certain heroes like Treant Protector.
burn: Refers to a mechanism used to deplete an opponent's mana, crucial in trilane strategies.
matters: In Dota 2, every decision, including hero picks, can significantly change the outcome.
fault: What a player will often blame when losing, rather than acknowledging their own mistakes.
support: Role in Dota 2 focused on helping the team, often with abilities to aid and sustain.
team: Group of players working together to win, where synergy and composition are key to victory.
Note that the Words themselves were not picked by OpenAI but rather a per-selection from the BERT Embeddings ML Algorithm but this time with more than just a word as context.
This is definitely going in the right direction. It's only sample size of 1 but i had to share it with you!
The way the NYT does this on their web interface is nice. They have the puzzle in one column, the across clues in a second column, and the down clues in a third column. The clue columns each are scrollable.
It automatically scrolls to keep the clue for whatever word you have selected in view and highlights that clue, and also automatically scrolls to keep the clue for whatever word crosses that word at the particular square you have highlighted is also visible and marked in the margin of its clue list.
They do similar in their iPad app, but also below the puzzle show the clue for the selected word and for whatever word crosses it at the highlighted square. With that you can concentrate on the grid and a fixed clue area.
However, this is well done and it inspired a thought- I wonder if it would be possible to procedurally generate word games, such as a mini crossword or word ladder or so on, as part of a language learning regime? Think Duolingo but for word puzzle fans.
As an example, you solve a mini crossword every day where 80% of the clues/answers are in English, and 20% are drawn from a progressive set of vocabulary in the other language.
The only name I could think of in 5 letters that fit here was actually "Peach".
So maybe rather have some for League, some for CS, etc? Maybe you can do a mixed indie one with very popular games, or mixed Shooter. But then the questions have to be less difficult :D
I forgot to mention but it might also be worth exploring more classic NLP techniques like named entity recognition to score clues higher and lower in terms of overall specificity.
//EDIT. I've been working on it for approx. 10 hours today (still going) with the goal to put live a much better quality version. I feel the pressure and want this fixed myself. However it's quite a computational challenge i am giving my very best
After a ~30 hours weekend coding marathon, I've just pushed a new version of the original joystick-jargon (r/gaming) and a new r/leagueoflegends puzzle live.
https://capsloq.de/crosswords/joystick-jargon
https://capsloq.de/crosswords/r/leagueoflegends
What changed?
- 5 new puzzles for r/gaming
- 6 new puzzles for r/leagueoflegends
- Old puzzles deleted
- New extraction algorithm (everything new: tokenizer, transformers, piplines, model, word and document embeddings, scoring, complete overhaul ...)
- New clue prompting
- Grid can now only contain diagonal black boxes (should guarantee intersections)
- Fixed numbering bug on the grid
- Did proof read each puzzle and some slight adjustments to guarantee puzzle integrity.
Warning: When i did proof read the League of Legends Q&A I noticed that I've never played that game so I couldn't verify everything!
Thank you very much to everyone who provided feedback to improve on v1.
I really hope you feel an increase in quality. I am looking forward for even more feedback and improving further.
Planning to use more suitable datasets in the future. It's super hard to get quality crossword list out of r/gaming.
Have fun puzzling! (please)