DeepFabric – Generate high-quality synthetic datasets at scale

If anyone's interested in synthetic data generation, we've built a fully interactive visual tool for SDG. It supports generating hierarchical topic trees like other tools, but we do two things others don't:

First: fully interactive UI. This might sound unnecessary, but synthetic data is a creative and iterative process. It helps to review each step as you go, tweaking prompts. Are the topics right? Are the inputs realistic? Are the outputs reasonable? Once your prompts are dialed in, you can scale up the volume, but there's a creative iterative process to get there.

Second: we have many templates for common synthetic data gen use cases. For fine-tuning you want to focus on the breadth of realistic inputs. For "bug" evals you want to trigger specific error cases based on a description of the issue. For measuring evaluators/LLM judges you need a topic tree mixing passing and failing data. We also provide templates for common use cases: bias, maliciousness, toxicity, jailbreaking, etc. These are good to bootstrap the creative process above, but you can edit each to meet your needs.

It's a free app on GitHub. Docs and videos: https://docs.kiln.tech/docs/synthetic-data-generation