Hacker News: data_ders

A coding agent is six functions in a trenchcoat

data_ders — Sat, 20 Jun 2026 01:40:50 +0000

Article URL: https://tidydesign.substack.com/p/a-coding-agent-is-six-functions-in

Comments URL: https://news.ycombinator.com/item?id=48605445

Points: 3

# Comments: 0

New comment by data_ders in "databow: a Rust CLI to query any database with an ADBC driver"

data_ders — Fri, 05 Jun 2026 13:05:35 +0000

I think the advantage is simplicity. Why connect first to duckdb and attach the db when you can query it directly with ADBC which is guaranteed to be fast

New comment by data_ders in "databow: a Rust CLI to query any database with an ADBC driver"

data_ders — Fri, 05 Jun 2026 13:03:02 +0000

Yeah for me standardization is the big win. But not just output formatting but cli commands and a guarantee that they’re as past as possible given that all the connectors use ADBC

Graphene

data_ders — Tue, 12 May 2026 16:47:07 +0000

Article URL: https://graphenedata.com/blog/introducing-graphene/

Comments URL: https://news.ycombinator.com/item?id=48110797

Points: 1

# Comments: 0

New comment by data_ders in "Where the goblins came from"

data_ders — Thu, 30 Apr 2026 11:13:34 +0000

Reminds me of the common observance of “machine elves” when taking DMT

New comment by data_ders in "Show HN: Rocky – Rust SQL engine with branches, replay, column lineage"

data_ders — Wed, 29 Apr 2026 15:41:47 +0000

thanks for the context!

> Auto-generating these dbt models and get the manifest aligned between Dagster code location

I just added you on LinkedIn. if you accept my connection there I can DM you a private preview document that you might find very interesting related to dbt project metadata (that is way less painful than `manifest.json`)

New comment by data_ders in "Show HN: Rocky – Rust SQL engine with branches, replay, column lineage"

data_ders — Wed, 29 Apr 2026 13:31:49 +0000

hiya, anders from dbt here. cool project -- I especially love the branching and budgeting options you've built in. both are things that I'd love for the dbt standard to include one day. was it dbt's lack of those feature that inspired you to start this project? It also seems you have an aversion to Jinja, which, believe me, I get!

FYI dbt-fusion [1] is going GA next week (though GA for Databricks will come later) Most of it is source-available and ELv2-licensed, but there's a number of crates that are Apache 2.0, namely: dbt-xdbc, dbt-adapter, dbt-auth, dbt-jinja, dbt-agate. We also have plans to OSS more as time goes on (stay tuned).

I just wanted to call out the OSS crates in case you'd rather focus on "making your beer taste better" than have to re-build foundations. I'd love to hear if any of those crates come in handy for you (even more so if they don't work for you).

Feel free to reach out on LinkedIn or dbt community Slack if you ever want to chat more!

[1]: https://github.com/dbt-labs/dbt-fusion

New comment by data_ders in "ggsql: A Grammar of Graphics for SQL"

data_ders — Mon, 20 Apr 2026 15:21:14 +0000

plus 1 for ADBC!

New comment by data_ders in "ggsql: A Grammar of Graphics for SQL"

data_ders — Mon, 20 Apr 2026 14:21:25 +0000

ok, this is definitely up my alley. color me nerd-sniped and forgive the onslaught of questions.

my questions are less about the syntax, which i'm largely familiar with knowing both SQL and ggplot.

i'm more interested in the backend architecture. Looking at the Cargo.toml [1], I was surprised to not see a visualization dependency like D3 or Vega. Is this intentional?

I'm certainly going to take this for a spin and I think this could be incredible for agentic analytics. I'm mostly curious right now what "deployment" looks like both currently in a utopian future.

utopia is easier -- what if databases supported it directly?!? but even then I think I'd rather have databases spit out an intermediate representation (IR) that could be handed to a viz engine, similar to how vega works. or perhaps the SQL is the IR?!

another question that arises from the question of composability: how distinct would a ggplot IR be from a metrics layer spec? could i use ggsql to create an IR that I then use R's ggplot to render (or vise versa maybe?)

as for the deployment story today, I'll likely learn most by doing (with agents). My experiment will be to kick off an agent to do something like: extract this dataset to S3 using dlt [2], model it using dbt [3], then use ggsql to visualize.

p.s. @thomasp85, I was a big fan of tidygraph back in the day [4]. love how small our data world is.

[1]: https://github.com/posit-dev/ggsql/blob/main/Cargo.toml

[2]: https://github.com/dlt-hub/dlt

[3]: https://github.com/dbt-labs/dbt-fusion

[4]: https://stackoverflow.com/questions/46466351/how-to-hide-unc...

New comment by data_ders in "Pipelined Relational Query Language, Pronounced "Prequel""

data_ders — Mon, 23 Feb 2026 16:11:59 +0000

right? like it's a graph and a relational model query and a pipeline and a language and an abstract syntax tree and declarative logical plan

New comment by data_ders in "Pipelined Relational Query Language, Pronounced "Prequel""

data_ders — Mon, 23 Feb 2026 16:06:45 +0000

what do you think is the "most bad" thing about SQL?

New comment by data_ders in "Pipelined Relational Query Language, Pronounced "Prequel""

data_ders — Mon, 23 Feb 2026 16:06:07 +0000

TIL about Verse looks cool I'll have to check it out.

> SQL is not a pipeline, it is a graph.

Maybe it's both? and maybe there will always be hard-to-express queries in SQL, and that's ok?

the RDBMS's relational model is certainly a graph and joins accordingly introduce complexity.

For me, just as creators of the internet regret that subdomains come before domains, I really we could go back in time and have `FROM` be the first predicate and not `SELECT`. This is much more intuitive and lends itself to the idea of a pipeline: a table scan (FROM) that is piped to a projection (SELECT).

New comment by data_ders in "Pipelined Relational Query Language, Pronounced "Prequel""

data_ders — Mon, 23 Feb 2026 15:53:54 +0000

I'm as big a SQL stan as the next person and I'm also very skeptical anytime anyone says that SQL needs to be replaced.

At the same time, it's challenging that SQL cannot be iteratively improved and experimented upon.

IMHO, PRQL is a reasonable approach to extending SQL without replacing SQL.

But what I'd love to see is projects like Google's zeta-sql [1] and Substrait [2] get more traction. It would provide a more stable, standardized foundation upon which SQL could be improved, which would make the case for "SQL forever" even more strong.

I've blogged about this before [3].

[1]: https://github.com/google/googlesql [2]: https://substrait.io/ [3]: https://roundup.getdbt.com/p/problem-exists-between-database...

New comment by data_ders in "Pipelined Relational Query Language, Pronounced "Prequel""

data_ders — Mon, 23 Feb 2026 15:44:19 +0000

I agree that CTEs help solve the problem of being able to read a SQL query from top to bottom, but I wouldn't say they're a panacea!

Personally, it's weird to me that `FROM` (scan) comes after `SELECT` (projection). IMHO the datasource should come first!

CTEs don't solve this problem they just let you chain multiple SELECTs together.

A real use case is that it would allow intellisense to kick in a lot earlier!

Instead you have to write `SELECT * FROM my_table` and only after can you edit the `*` and get auto-complete suggestions of the columns from `my_table`

New comment by data_ders in "Apache Arrow is 10 years old"

data_ders — Thu, 12 Feb 2026 15:37:50 +0000

if I could tell myself in 2015 who had just found the feather library and was using it to power my unhinged topic modeling for power point slides work, and explained what feather would become (arrow) and the impact it would have on the date ecosystem. I would have looked at 2026 me like he was a crazy person.

Yet today I feel it was 2016 dataders who is the crazy one lol

New comment by data_ders in "Apache Arrow is 10 years old"

data_ders — Thu, 12 Feb 2026 15:35:29 +0000

yeah not necessarily compute (though it has a kernel)!

it's actually many things IPC protocol wire protocol, database connectivity spec etc etc.

in reality it's about an in-memory tabular (columnar) representation that enables zero copy operations b/w languages and engines.

and, imho, it all really comes down to standard data types for columns!

New comment by data_ders in "Lance table format explained with simple animations"

data_ders — Thu, 12 Feb 2026 12:21:43 +0000

love the animations! I’ve been dreaming of doing the same to get people from csvs to something like Lance but with stops at page files, parquet, and Iceberg along the way

New comment by data_ders in "Show HN: I trained a 9M speech model to fix my Mandarin tones"

data_ders — Sat, 31 Jan 2026 04:12:36 +0000

same! but if you get it inevitably wrong the first time it gives you the pinyin. but i struggled to get it to transcribe the consonants I was making let alone the tones. i'm pretty sure i'm not as bad as that!

New comment by data_ders in "Show HN: ShapedQL – A SQL engine for multi-stage ranking and RAG"

data_ders — Thu, 29 Jan 2026 15:14:35 +0000

I'm a big SQL stan here and I love the concept and if you ever wanna chat about how it might integrate with dbt let me know :)

conceptual questions:

1) why did you pick SQL? to increase the Total Addressable Userbase with the thinking that a SQL API means more people can use it than those who know Python or Typescript?

2) What isn't or will never be supported by this relational model? what are the constraints? Clickhouse comes to mind w/ it's intentionally imposed limitations on JOINs

3) databases are historically the stickiest products, but even today SQL dialects are sticky because of how closely tied they are to the query engine. why do you think users will adopt not only a new dialect but a new engine? Especially given that the major DWH vendors have been relentlessly competing to add AI search vector functionality into their products?

4) mindsdb comes to mind as something similar that's been in the market for a while but I don't hear it come up often. what makes you different?

playground feedback: 1) why are there no examples that: a) use `JOIN` (that `,` is unhinged syntax imho for an implicit join) b) don't use `*` (it's cool that there's actual numbers!)

2) i kinda get why the search results defaults to a UI, but as a SQL person I first wanted to know what columns exist. I was happy to see "raw table" was available but it took me a while to find it. might be have raw table and UI output visible at the same time with clear instructions on what columns the query requires to populate the UI

New comment by data_ders in "Auto-compact not triggering on Claude.ai despite being marked as fixed"

data_ders — Fri, 23 Jan 2026 20:22:27 +0000

omg are you me? I had this exact same problem last week