Use Prolog to improve LLM's reasoning

An application I am developing for a customer needed to read constraints around clinical trials and essentially build a query from them. Constraints involve prior treatments, biomarkers, type of disease (cancers) etc.

Using just an LLM did not produce reliable queries, despite trying many many prompts, so being an old Prolog hacker I wondered if using it might impose more 'logic' on the LLM. So we precede the textual description of the constraints with the following prompt:

-------------

Now consider the following Prolog predicates:

biomarker(Name, Status) where Status will be one of the following integers -

Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3 Amplified = 4 Deleted = 5 Positive = 6 Negative = 7

tumor(Name, Status) where Status will be one of the following integers if know else left unbound -

Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression = 4

chemo(Name)

surgery(Name) Where Name may be an unbound variable

other_treatment(Name)

radiation(Name) Where Name may be an unbound variable

Assume you are given predicate atMost(T, N) where T is a compound term and N is an integer. It will return true if the number of 'occurences' of T is less than or equal N else it will fail.

Assume you are given a predicate atLeastOneOf(L) where L is a list of compound terms. It will succeed if at least one of the compound terms, when executed as a predicate returns true.

Assume you are given a predicate age(Min, Max) which will return true if the patient's age is in between Min and Max.

Assume you have a predicate not(T) which returns true if predicate T evaluates false and vice versa. i.e. rather than '\\+ A' use not(A).

Do not implement the above helper functions.

VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise use ';' to represent 'OR'. i.e. rather than 'A ; B' use atLeastOneOf([A, B]).

EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT and wildtype EGFR. Patient must not have mutated KRAS.

EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2), biomarker('EGFR', 0), not(biomarker('KRAS', 1))

------------------

The Prolog predicates, when evaluated generate the required underlying query (of course the Prolog is itself a form of query).

Anyway - the upshot was a vast improvement in the accuracy of the generated query (I've yet to see a bad one). Somewhere in its bowels, being told to generate Prolog 'focused' the LLM. Perhaps LLMs are happier with declarative languages rather than imperative ones (I know I am :) ).