←back to thread

52 points zomh | 2 comments | | HN request time: 0.405s | source

As a fan of dense New York Times-style crosswords, I challenged myself to create topic-specific puzzles. It turns out that generating crosswords and efficiently placing words is a non-trivial computational problem.

I started the project, "Joystick Jargon" combining traditional crossword elements with gaming-related vocabulary. Here's the technical process behind it:

1. Data Source: Used a 3.8 Million Rows Reddit dataset from Hugging Face (https://huggingface.co/datasets/webis/tldr-17).

2. Data Filtering: Narrowed down to gaming-related subreddits (r/gaming, r/dota2, r/leagueoflegends).

3. Keyword Extraction: Employed ML techniques, specifically BERT-embeddings and cosine similarity, to extract keywords from the subreddits.

4. Data Preprocessing: Cleaned up data unsuitable for crossword puzzles.

5. Grid Generation: Implemented a heuristic crossword algorithm to create grids and place words efficiently.

6. Clue Generation: Utilized a Large Language Model to generate context-aware clues for the placed words.

The resulting system creates crossword puzzles that blend traditional elements with gaming terminology, achieving about a 50-50 mix.

This project is admittedly overengineered for its purpose, but it was an interesting exploration into natural language processing, optimization algorithms, and the intersection of traditional word games with modern gaming culture.

A note on content: Since the data source is Reddit, some mature language may appear in the puzzles. Manual filtering was minimal to preserve authenticity.

You can try the puzzles here: <https://capsloq.de/crosswords/joystick-jargon>

I'm curious about the HN community's thoughts on this approach to puzzle generation? What other domains might benefit from similar computational techniques for content creation?

Show context
maxrmk ◴[] No.41880858[source]
Tried it! I really like the idea, but I think the clue generation could use some work. Every clue ended in "in games", and honestly most of them were not really game related to start with. For example the clue "Place in games where characters go to rest and replenish health or mana" had the solution "bar"... which I wouldn't describe as right. Similarly "The name of a popular character who may need rescuing in some games" was "Emily".

I think it might be worth working on prompting to make sure the answer is a unique solution to the hint (or at least closer to unique). What model are you using here?

replies(2): >>41881197 #>>41881364 #
cableshaft ◴[] No.41881197[source]
Haha, for those two you mentioned I assumed it was 'Inn' and 'Zelda'. I don't even know who Emily is.

And googling 'Emily video game character' didn't bring up any noticeably popular video game characters.

replies(2): >>41881916 #>>41883946 #
1. jayGlow ◴[] No.41883946[source]
emily is the name of characters that are rescued in bioshock infinite as well as the first dishonored game. both games are around a decade old but were popular at the time.
replies(1): >>41885467 #
2. NBJack ◴[] No.41885467[source]
Isn't the character in Bioshock Infinite 'Elizabeth'? I'd also assert that by design, Elizabeth was meant to be a character that arguably didn't really need to be rescued, "she can take care of herself".

The only name I could think of in 5 letters that fit here was actually "Peach".