I like the power of `jq` and the fact that LLMs are proficient at it, but I find it right out impossible to come up with the right `jq` incantations myself. Has anyone here been in a similar situation? Which tool / language did you end up exposing to your users?
(LLMs are already very adept at using `jq` so I would think it was preferable to be able to prompt a system that implements querying inside of source code as "this command uses the same format as `jq`")
I then switched to JavaScript / TypeScript, which I found much better overall: it's understandable to basically every developer, and LLMs are very good at it. So now in my app I have a button wherever a TypeScript snippet is required that asks the LLM for its implementation, and even "weak" models one-shot it correctly 99% of the times.
It's definitely more difficult to set up, though, as it requires a sandbox where you can run the code without fears. In my app I use QuickJS, which works very well for my use case, but might not be performant enough in other contexts.
it might just be a very limited subset?
Which isn't to say jq is the best or even good but its battle-tested and just about every conceivable query problem has been thrown at it by now.
Things like https://jsonlogic.com/ works better if you wish to expose a rest api with a defined query schema or something like that. Instead of accepting a query `string`. This seems better as in you have a string format and a concrete JSON format. Also APIs to convert between them.
Also, if you are building a filter interface, having a structured representation helps:
https://react-querybuilder.js.org/demo?outputMode=export&exp...
``` $sum($myArrayExtractor($.context)) ```
where `$myArrayExtractor` is your custom code.
---
Re: "how did it go"
We had a situation where we needed to generate EDI from json objects, which routinely required us to make small tweaks to data, combine data, loop over data, etc. JSONata provided a backend framework for data transformations that reduced the scope and complexity of the project drastically.
I think JSONata is an excellent fit for situations where companies need to do data transforms, for example when it's for the sake of integrations from 3rd-party sources; all the data is there, it just needs to be mapped. Instead of having potentially buggy code as integration, you can have a pseudo-declarative jsonata spec that describes the transform for each integration source, and then just keep a single unified "JSONata runner" as the integration handler.
Helpful when querying JSON API responses that are parsed and persisted for normal, relational uses. Sometimes you want to query data that you weren’t initially parsing or that matches a fix to reprocess.
Admittedly I don't know that much about LLM optimization/configuration, so apologies if I'm asking dumb questions. Isn't the value of needing to copy/paste that prompt in front of your queries a huge bog on net token efficiency? Like wouldn't you need to do some hundred/thousand query translations just to break even? Maybe I don't understand what you've built.
Cool idea either way!
It's nice because we can just put the JSONata expression into a db field, and so you can have arbitrary data transforms for different customers for different data structures coming or going, and they can be set up just by editing the expression via the site, without having to worry about sandboxing it (other than resource exhaustion for recursive loops). It really sped up the iteration process for configuring transforms.
[1] - https://duckdb.org/docs/stable/data/json/overview [2] - https://www.malloydata.dev/
mapValues(mapKeys(substring(get(), 0, 10)))
This is all too cute. Why not just use JavaScript syntax? You can limit it to the exact amount of functionality you want for whatever reason it is you want to limit it.
I implemented one day of advent of code in jq to learn it: https://github.com/ivanjermakov/adventofcode/blob/master/aoc...
obj.friends.filter(x=>{ return x.city=='New York'})
.sort((a, b) => a.age - b.age)
.map(item => ({ name: item.name, age: item.age }));
does exactly the same without any plugin.am I missing something?
To your point abstractions often multiply and then hide the complexity, and create a facade of simplicity.
I've observed that too many users of jq aren't willing to take a few minutes to understand how stream programming works. That investment pays off in spades.
It made my life a lot easier
it's a pipeline operating on a stream of independent json terms. The filter is reapplied to every element from the stream. Streams != lists; the latter are just a data type. `.` always points at the current element of the stream. Functions like `select` operate on separate items of the stream, while `map` operates on individual elements of a list. If you want a `map` over all elements of the stream: that's just what jq is, naturally :)
stream of a single element which is a list:
echo '[1,2,3,4]' | jq .
# [1,2,3,4]
unpack the list into a stream of separate elements: echo '[1,2,3,4]' | jq '.[]'
# 1
# 2
# 3
# 4
echo '[1,2,3,4]' | jq '.[] | .' # same: piping into `.` is a NOP:
only keep elements 2 and 4 from the stream, not from the array--there is no array left after .[] : echo '[1,2,3,4]' | jq '.[] | select(. % 2 == 0)'
# 2
# 4
keep the array: echo '[1,2,3,4]' | jq 'map(. * 2)'
# [2,4,6,8]
map over individual elements of a stream instead: echo '[1,2,3,4]' | jq '.[] | . * 2'
# 2
# 4
# 6
# 8
printf '1\n2\n3\n4\n' | jq '. * 2' # same
This is how you can do things like printf '{"a":{"b":1}}\n{"a":{"b":2}}\n{"a":{"b":3}}\n' | jq 'select(.a.b % 2 == 0) | .a'
# {"b": 2}
select creates a nested "scope" for the current element in its parens, but restores the outer scope when it exits.Hope this helps someone else!
Yes there are issues with ideologically motivated moderators, poorly cited articles, etc. But even with its flaws, it's an amazing resource provided to the public for free (as in coffee and maybe as in speech also).