Using just an LLM did not produce reliable queries, despite trying many many prompts, so being an old Prolog hacker I wondered if using it might impose more 'logic' on the LLM. So we precede the textual description of the constraints with the following prompt:
-------------
Now consider the following Prolog predicates:
biomarker(Name, Status) where Status will be one of the following integers -
Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3 Amplified = 4 Deleted = 5 Positive = 6 Negative = 7
tumor(Name, Status) where Status will be one of the following integers if know else left unbound -
Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression = 4
chemo(Name)
surgery(Name) Where Name may be an unbound variable
other_treatment(Name)
radiation(Name) Where Name may be an unbound variable
Assume you are given predicate atMost(T, N) where T is a compound term and N is an integer. It will return true if the number of 'occurences' of T is less than or equal N else it will fail.
Assume you are given a predicate atLeastOneOf(L) where L is a list of compound terms. It will succeed if at least one of the compound terms, when executed as a predicate returns true.
Assume you are given a predicate age(Min, Max) which will return true if the patient's age is in between Min and Max.
Assume you have a predicate not(T) which returns true if predicate T evaluates false and vice versa. i.e. rather than '\\+ A' use not(A).
Do not implement the above helper functions.
VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise use ';' to represent 'OR'. i.e. rather than 'A ; B' use atLeastOneOf([A, B]).
EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT and wildtype EGFR. Patient must not have mutated KRAS.
EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2), biomarker('EGFR', 0), not(biomarker('KRAS', 1))
------------------
The Prolog predicates, when evaluated generate the required underlying query (of course the Prolog is itself a form of query).
Anyway - the upshot was a vast improvement in the accuracy of the generated query (I've yet to see a bad one). Somewhere in its bowels, being told to generate Prolog 'focused' the LLM. Perhaps LLMs are happier with declarative languages rather than imperative ones (I know I am :) ).