E.g. sorting 2^23 random 64-bit integers: qsort: 850ms, custom radix sort: 250ms, ksort.h: 582ms, np.sort: 107ms (including PyArray_SimpleNewFromData, PyArray_Sort). Where numpy uses intel's x86-simd-sort I believe.
E.g. inserting 8M entries into a hash table (random 64-bit keys and values): MSI-style hash table: ~100ns avg insert/lookup, cc_map: ~95ns avg insert/lookup, Python.h: 91ns insert, 60ns lookup
I'm curious if OPs tool might fit in similarly. I've found lmdb to be quite slow even in tmpfs with no sync, etc.
It does look like Python's comprehensions would be a better choice if you're writing them by hand anyway.
https://github.com/bytecodealliance/wasmtime/tree/main/examp...
I am always in favor of declarative approaches where applicable. But whenever they are embedded in this way, you get this static analysis barrier and a possible mismatch between the imperative and declarative code, where you change a return type or field declaratively and it doesn't come up as an error in the surrounding code.
A positive example is VerbalExpressions in Java, which only allow expressing valid regular expressions and every invalid regular expression is inexpressible in valid java code. Jooq is another example, which makes incorrect (even incorrectly typed) SQL code inexpressible in Java.
I know python is a bit different, as there is no extensive static analysis in the compiler, but we do indeed have a lot of static analysis tools for python that could be valuable. A statically type-safe query is a wonderful thing for safety and maintainability and we do have good type-checkers for python.
{'name__contains':"k", "age__lt":20}
Kind of tangential to this package, but I've always loved this filter query syntax. Does it have a name?I first encountered it in Django ORM, and then in DRF, which has them as URL query params. I have recently built a parser for this in Javascript to use it on the frontend. Does anyone know any JS libraries that make working with this easy? I'm thinking parsing and offering some kind of database-agnostic marshaling API. (If not, I might have to open-source my own code!)
Q(name=contains('k'))
it is not particularly more complex to write and certainly more composable, extensible and checkable.Alternatively go full eval and do
Q("'k' in name")
I recently started using this pattern for pytest equality assertions, as pytest helpfully produces a detailed diff on mismatch. It's not perfect, as pytest doesn't always produce a correct diff with this pattern, but it's better than some alternatives.
I have never been a huge fan of Python (Lisp person) but I really appreciate how concise Python can be, and the dynamic nature of Python allows the nice query syntax.
A significant advantage is that you can just pass an inline lambda.
Of course, if you use the same syntax for Python lists of dicts, you don’t need any library at all.
If you can run static analysis on that you can run static analysis on string literals. Much like how C will give you warnings for mismatched printf arguments.
- Peter Norvig
It has been a while since I studied his Lisp code, but I watch for new Python studies he releases.
Here's the usage example from the README:
from leopards import Q
l = [{"name":"John","age":"16"}, {"name":"Mike","age":"19"},{"name":"Sarah","age":"21"}]
filtered= Q(l,{'name__contains':"k", "age__lt":20})
print(list(filtered))
Versus: [x for x in l if ('k' in x['name'] and int(x['age']) < 20)]
Outputs: [{'name': 'Mike', 'age': '19'}]
Also from the readme: > Even though, age was str in the dict, as the value of in the query dict was int, Leopards converted the value in dict automatically to match the query data type. This behaviour can be stopped by passing False to convert_types parameter.
I don't like this default behavior.Usually sideways, but if you stack them, you might get some vertical.
There's a reason the term "stringly-typed" is used as a criticism of a language.
If you're interested in a simple solution to query a list with SQL including vector similarity, check this out: https://gist.github.com/davidmezzetti/f0a0b92f5281924597c9d1...
filtered = Q(l,{'name__contains':"k", "age__lt":20})
Verus: filtered = [x for x in l if ('k' in x['name'] and int(x['age']) < 20)]
I don't have a strong background in static analysis.
I've seen some stuff based on treesitter that seems to be prompting a revival of the idea, but it still has fundamental issues, e.g., if I'm embedding in python:
sql = "SELECT * FROM table "
if arbitrarilyComplicatedCondition:
sql += "INNER JOIN a AS joined ON table.thing = a.id "
else:
sql += "INNER JOIN b AS joined ON table.thing = b.id "
sql += "WHERE joined.
and if you imagine trying to write something to autocomplete at the point I leave off, you're fundamentally stuck on not knowing which table to autocomplete with. It doesn't matter what tech you swing at the problem, since trying to analyze "arbitrarilyComplicatedCondition" is basically Turing Complete (which I will prove by vigorous handwave here because turning that into a really solid statement would be much larger than this entire post, but, it can be done). And that's just a simple and quick example, it's not just "autocomplete", it's any analysis you may want to do on the embedded content.This is just a simple example; they get arbitrarily complicated, quickly. This is one of those things that when you think of the simple case it seems so easy but when you try to bring it into the real world it immediately explodes with all the complexity your mind's eye was ignoring.
https://arrow.apache.org/overview/
Apache Arrow solves most discussed problems, such as improving speed, interoperability, and data types, especially for strings. For example, the new string[pyarrow] column type is around 3.5 times more efficient. [...] The significant achievement here is zero-copy data access, mapping complex tables to memory to make accessing one terabyte of data on disk as fast and easy as one megabyte.
https://airbyte.com/blog/pandas-2-0-ecosystem-arrow-polars-d...
That said, it's trivial to apply multiple filter lambdas in one pass -- the most natural way is a comprehension.
Still, you might be surprised by how fast filter(cond_1, filter(cond_2, data)) actually is. The OP didn't present that performance comparison, nor can I see any reason they gave to avoid comprehensions.
{"name" (includes? "k"), "age" (< 20)}
into {"name" #(includes? % "k"), "age" #(< % 20)}
which is the same as {"name" (fn [name] (includes? name "k")), "age" (fn [age] (< age 20))}
Then have another macro that converts that into the pattern matching code, or maybe there's already something in the standard library.You could serialize the patterns using EDN as a substitute for JSON.
Fun stuff.
I wrote something [similar][1] in javascript. With that it would be:
const is_k_kid = tisch.compileFunction((etc) => ({
'name': name => name.includes('k'),
'age': age => age < 20,
...etc
}));
const result = input.filter(is_k_kid);
Yes, "...etc" is part of the DSL.