←back to thread

52 points zomh | 3 comments | | HN request time: 0.001s | source

As a fan of dense New York Times-style crosswords, I challenged myself to create topic-specific puzzles. It turns out that generating crosswords and efficiently placing words is a non-trivial computational problem.

I started the project, "Joystick Jargon" combining traditional crossword elements with gaming-related vocabulary. Here's the technical process behind it:

1. Data Source: Used a 3.8 Million Rows Reddit dataset from Hugging Face (https://huggingface.co/datasets/webis/tldr-17).

2. Data Filtering: Narrowed down to gaming-related subreddits (r/gaming, r/dota2, r/leagueoflegends).

3. Keyword Extraction: Employed ML techniques, specifically BERT-embeddings and cosine similarity, to extract keywords from the subreddits.

4. Data Preprocessing: Cleaned up data unsuitable for crossword puzzles.

5. Grid Generation: Implemented a heuristic crossword algorithm to create grids and place words efficiently.

6. Clue Generation: Utilized a Large Language Model to generate context-aware clues for the placed words.

The resulting system creates crossword puzzles that blend traditional elements with gaming terminology, achieving about a 50-50 mix.

This project is admittedly overengineered for its purpose, but it was an interesting exploration into natural language processing, optimization algorithms, and the intersection of traditional word games with modern gaming culture.

A note on content: Since the data source is Reddit, some mature language may appear in the puzzles. Manual filtering was minimal to preserve authenticity.

You can try the puzzles here: <https://capsloq.de/crosswords/joystick-jargon>

I'm curious about the HN community's thoughts on this approach to puzzle generation? What other domains might benefit from similar computational techniques for content creation?

Show context
dmonitor ◴[] No.41881295[source]
> 1: Allows movement between different areas in a game

> Unportal

what?

>19: An online forum section where gamers come together to discuss specific topics

> ITT

I don't think that's right...

Most of the questions just seem like normal crossword questions, but with the term "in games" added to it.

I'm not gonna sugarcoat it: this sucks. The crossword grids often have totally isolated words. 1 across and 2 down start with the same letter. The questions are nonsensical. I'd hardly go so far as to call it an interesting proof of concept.

replies(1): >>41881384 #
1. zomh ◴[] No.41881384[source]
Thank you for trying! Please see my reply on the other comment it fits here as well. This is a non proof read version and i agree with you consider this a starting point. Also feel free the try out the later puzzles (10+) i changed the prompt at some point because i noticed the same thing.
replies(1): >>41881420 #
2. zomh ◴[] No.41881420[source]
Edit: At some point I was unsure myself, if i don't know certain things about gaming or if the AI is making things up, haha, so this feedback is extremely valuable for me thank you
replies(1): >>41881534 #
3. ◴[] No.41881534[source]