Query Your Python Lists

1. maweki ◴[20 Nov 24 11:11 UTC] No.42192775[source]▶

Embedding functionality into strings prevents any kind of static analysis. The same issue as embedding plain SQL, plain regexes, etc..

I am always in favor of declarative approaches where applicable. But whenever they are embedded in this way, you get this static analysis barrier and a possible mismatch between the imperative and declarative code, where you change a return type or field declaratively and it doesn't come up as an error in the surrounding code.

A positive example is VerbalExpressions in Java, which only allow expressing valid regular expressions and every invalid regular expression is inexpressible in valid java code. Jooq is another example, which makes incorrect (even incorrectly typed) SQL code inexpressible in Java.

I know python is a bit different, as there is no extensive static analysis in the compiler, but we do indeed have a lot of static analysis tools for python that could be valuable. A statically type-safe query is a wonderful thing for safety and maintainability and we do have good type-checkers for python.

replies(4): >>42193389 #>>42194000 #>>42194121 #>>42196401 #

2. gpderetta ◴[20 Nov 24 12:36 UTC] No.42193389[source]▶

>>42192775 (TP) #

If your schema is dynamic, in most languages there isn't much you can do, but at least in python

   Q(name=contains('k'))

it is not particularly more complex to write and certainly more composable, extensible and checkable.

Alternatively go full eval and do

   Q("'k' in name")

3. notpushkin ◴[20 Nov 24 14:01 UTC] No.42194000[source]▶

>>42192775 (TP) #

I love how PonyORM does this for SQL: it’s just Puthon `x for x in ... if ...`.

Of course, if you use the same syntax for Python lists of dicts, you don’t need any library at all.

4. eddd-ddde ◴[20 Nov 24 14:19 UTC] No.42194121[source]▶

>>42192775 (TP) #

I disagree. You'll be surprised to hear this, but source code... is just a very big string...

If you can run static analysis on that you can run static analysis on string literals. Much like how C will give you warnings for mismatched printf arguments.

replies(1): >>42194391 #

5. maweki ◴[20 Nov 24 14:52 UTC] No.42194391[source]▶

>>42194121 #

You might be surprised to hear that most compilers and static analysis tools in general do not inspect (string and other) literals, while they do indeed inspect all the other parts and structure of the abstract syntax tree.

replies(1): >>42195260 #

6. eddd-ddde ◴[20 Nov 24 16:07 UTC] No.42195260{3}[source]▶

>>42194391 #

I know, but that's the point, if you can get a string into an AST you can just do the same thing with the string literals. It's not magic.

replies(3): >>42195362 #>>42196090 #>>42196502 #

7. saghm ◴[20 Nov 24 16:15 UTC] No.42195362{4}[source]▶

>>42195260 #

You can't get an arbitrary string into an AST, only ones that can be at parsed correctly. Rejecting the invalid strings that wouldn't make sense to do analysis on is pretty much the same thing that the parent comment is saying to do with regexes, SQL, etc., just as part of the existing compilation that's happening via the type system rather than at runtime.

replies(1): >>42196082 #

8. skeledrew ◴[20 Nov 24 17:15 UTC] No.42196082{5}[source]▶

>>42195362 #

Everything can be abstracted away using specialized objects, which can allow for better checking. The Python AST itself is just specialized objects, and it can be extended (but of course with much more work, esp in the analysis tools). There's also this very ingenious - IMO - monstrosity: https://pydong.org/posts/PythonsPreprocessor/. Pick your poison.

9. scott_w ◴[20 Nov 24 17:16 UTC] No.42196090{4}[source]▶

>>42195260 #

Not in the standard language functions. If you wanted to achieve this, you have to write your own parser. That parser is, by definition, not the language parser, adding a level of difficulty to proving any correctness of your program.

There's a reason the term "stringly-typed" is used as a criticism of a language.

10. WesleyJohnson ◴[20 Nov 24 17:45 UTC] No.42196401[source]▶

>>42192775 (TP) #

Could this be mitigated by using `dict()` instead of the `{}` literal, and then running an analysis to ensure the provided dictionary keys all end with valid operations? E.g, __contains, __lt, etc?

I don't have a strong background in static analysis.

11. jerf ◴[20 Nov 24 17:56 UTC] No.42196502{4}[source]▶

>>42195260 #

This is one of those ideas that I've seen kicking around for at least a decade now, but manifesting it in real code is easier said than done. And that isn't even the real challenge, the real challeng is keeping it working over time.

I've seen some stuff based on treesitter that seems to be prompting a revival of the idea, but it still has fundamental issues, e.g., if I'm embedding in python:

    sql = "SELECT * FROM table "
    if arbitrarilyComplicatedCondition:
        sql += "INNER JOIN a AS joined ON table.thing = a.id "
    else:
        sql += "INNER JOIN b AS joined ON table.thing = b.id "
    sql += "WHERE joined.

and if you imagine trying to write something to autocomplete at the point I leave off, you're fundamentally stuck on not knowing which table to autocomplete with. It doesn't matter what tech you swing at the problem, since trying to analyze "arbitrarilyComplicatedCondition" is basically Turing Complete (which I will prove by vigorous handwave here because turning that into a really solid statement would be much larger than this entire post, but, it can be done). And that's just a simple and quick example, it's not just "autocomplete", it's any analysis you may want to do on the embedded content.

This is just a simple example; they get arbitrarily complicated, quickly. This is one of those things that when you think of the simple case it seems so easy but when you try to bring it into the real world it immediately explodes with all the complexity your mind's eye was ignoring.