←back to thread

52 points zomh | 2 comments | | HN request time: 0.547s | source

As a fan of dense New York Times-style crosswords, I challenged myself to create topic-specific puzzles. It turns out that generating crosswords and efficiently placing words is a non-trivial computational problem.

I started the project, "Joystick Jargon" combining traditional crossword elements with gaming-related vocabulary. Here's the technical process behind it:

1. Data Source: Used a 3.8 Million Rows Reddit dataset from Hugging Face (https://huggingface.co/datasets/webis/tldr-17).

2. Data Filtering: Narrowed down to gaming-related subreddits (r/gaming, r/dota2, r/leagueoflegends).

3. Keyword Extraction: Employed ML techniques, specifically BERT-embeddings and cosine similarity, to extract keywords from the subreddits.

4. Data Preprocessing: Cleaned up data unsuitable for crossword puzzles.

5. Grid Generation: Implemented a heuristic crossword algorithm to create grids and place words efficiently.

6. Clue Generation: Utilized a Large Language Model to generate context-aware clues for the placed words.

The resulting system creates crossword puzzles that blend traditional elements with gaming terminology, achieving about a 50-50 mix.

This project is admittedly overengineered for its purpose, but it was an interesting exploration into natural language processing, optimization algorithms, and the intersection of traditional word games with modern gaming culture.

A note on content: Since the data source is Reddit, some mature language may appear in the puzzles. Manual filtering was minimal to preserve authenticity.

You can try the puzzles here: <https://capsloq.de/crosswords/joystick-jargon>

I'm curious about the HN community's thoughts on this approach to puzzle generation? What other domains might benefit from similar computational techniques for content creation?

1. tzs ◴[] No.41884577[source]
I had to reduce the size in my browser a couple or so times to see both the puzzle and all the clues at the same time. It might be better to have the clues in a separate scrolling region on the page.

The way the NYT does this on their web interface is nice. They have the puzzle in one column, the across clues in a second column, and the down clues in a third column. The clue columns each are scrollable.

It automatically scrolls to keep the clue for whatever word you have selected in view and highlights that clue, and also automatically scrolls to keep the clue for whatever word crosses that word at the particular square you have highlighted is also visible and marked in the margin of its clue list.

They do similar in their iPad app, but also below the puzzle show the clue for the selected word and for whatever word crosses it at the highlighted square. With that you can concentrate on the grid and a fixed clue area.

replies(1): >>41884706 #
2. zomh ◴[] No.41884706[source]
Thank you for that feedback. I agree NYT does this a lot better. I'll have to improve on that. Not gonna lie to you I suck at CSS/Styling/UX/UI will need to get some help regarding this. However i feel you and it's important for the fun of the game