Hacker News: jeadie

New comment by jeadie in "Distributed DuckDB Instance"

jeadie — Tue, 14 Apr 2026 10:20:44 +0000

This is exactly what we found. Ingest rates were tough. We partitioned and ran over multiple duckdb instances too (and wrangled the complexity).

We ending up building a Sqlite + vortex file alternative for our use case: https://spice.ai/blog/introducing-spice-cayenne-data-acceler...

New comment by jeadie in "Distributed DuckDB Instance"

jeadie — Tue, 14 Apr 2026 10:14:43 +0000

You might find https://github.com/apache/datafusion and https://github.com/datafusion-contrib/datafusion-federation of interest

New comment by jeadie in "Vector database that can index 1B vectors in 48M"

jeadie — Fri, 12 Sep 2025 23:50:56 +0000

We’re building vector indexes into Datafusion for search (starting with S3 vectors).

Open source at https://github.com/spiceai/spiceai

New comment by jeadie in "Airport for DuckDB"

jeadie — Fri, 23 May 2025 02:15:01 +0000

This is one of the ideas behind using DuckDB in github.com/spiceai/spiceai

New comment by jeadie in "Show HN: TextQuery – Query CSV, JSON, XLSX Files with SQL"

jeadie — Tue, 06 May 2025 01:00:45 +0000

There’s also https://github.com/spiceai/spiceai

New comment by jeadie in "Pinecone integrates AI inferencing with vector database"

jeadie — Wed, 04 Dec 2024 09:56:11 +0000

This is a common feature now. If anything, for being so early to vector databases, Pinecone was rather late to integrating embeddings.

Timescale most recently added it but, yes a bunch of others: Weaviate, Spice AI, Marqo, etc.

New comment by jeadie in "Pg_parquet: An extension to connect Postgres and parquet"

jeadie — Fri, 18 Oct 2024 00:52:27 +0000

Why not just federate Postgres and parquet files? That way the query planner can push down as much of the query and reduce how much data has to move about?

New comment by jeadie in "Pg_lakehouse: Query Any Data Lake from Postgres"

jeadie — Mon, 13 May 2024 21:45:45 +0000

This looks functionally similar as using http://github.com/spiceai/spiceai with a postgreSQL data accelerator.

New comment by jeadie in "Ask HN: Who is hiring? (April 2024)"

jeadie — Mon, 01 Apr 2024 22:03:27 +0000

Spice AI | Senior Software Engineer | GMT+10 (e.g. Australia) through GMT-7 (e.g. Seatle/SF/LA) | Remote | Full Time

Spice AI provides building blocks for data and AI-driven applications by composing real-time and historical time-series data, high-performance SQL query, machine learning training and inferencing, in a single, interconnected AI backend-as-a-service.

We just launched github.com/spiceai/spiceai, a unified SQL query interface and portable runtime to locally materialize, accelerate, and query data tables sourced from any database, data warehouse, or data lake.

We're hiring experienced software engineers, ideally with Rust and/or Golang production experience. We're focused on large data and distributed systems, experience in these is important too. More details: https://spice.ai/careers#section-open-positions

New comment by jeadie in "Show HN: Spice.ai – materialize, accelerate, and query SQL data from any source"

jeadie — Thu, 28 Mar 2024 21:11:48 +0000

And yes, Iceberg is very high up on our list

New comment by jeadie in "Show HN: Spice.ai – materialize, accelerate, and query SQL data from any source"

jeadie — Thu, 28 Mar 2024 21:11:28 +0000

Yes! It can connect to FlightSQL compatible servers (see https://docs.spiceai.org/data-connectors/flightsql ) and its also a FlightSQL compatible server

New comment by jeadie in "Show HN: Yes, another vector embeddings API"

jeadie — Fri, 09 Jun 2023 07:23:01 +0000

Have you seen github.com/marqo-ai/marqo? It does all this wrapping, and you don't even need to pay for OpenAI or pinecone

New comment by jeadie in "GGML – AI at the Edge"

jeadie — Thu, 08 Jun 2023 04:36:47 +0000

I'm very glad that this has some added funding. I am building a serverless API on the cloudflare edge network using GGML as the backbone --> tryinfima.com

New comment by jeadie in "Weaviate – Open-Source AI Native Vector Database"

jeadie — Thu, 01 Jun 2023 01:38:55 +0000

"AI Native" catching on

New comment by jeadie in "PrivateGPT"

jeadie — Mon, 22 May 2023 22:06:37 +0000

I've tried both Chroma and Qdrant. I don't think Chroma lacks that much. Definitely newer, but is also a great product. I think cloud support coming Q3 2023

New comment by jeadie in "Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated"

jeadie — Thu, 20 Apr 2023 09:52:00 +0000

(Not affiliated with hyperDB)

New comment by jeadie in "Ask HN: Seeking a Vector Database for ClickHouse Users – Suggestions Appreciated"

jeadie — Thu, 20 Apr 2023 09:51:44 +0000

I've been using https://github.com/jdagdelen/hyperDB and it's been really easy to use. I think Clickhouse support is on the short-term roadmap.

New comment by jeadie in "After All Is Said and Indexed – Unlocking Information in Recorded Speech"

jeadie — Sat, 15 Apr 2023 23:46:01 +0000

Most people, like me, who end up needing to use vector DBs, are wanting to use LLMs on a specific, often private dataset/use case. Typically one starts with something like unstructured JSON data, then need to pick and manage LLMs to create embeddings, then store these and the original JSON data in a vectorDB. Then the application is some variety of CRUD operations + searching over both the original data and the embeddings.

Chroma, Pinecone, I guess FAISS/HNSWlib/etc only handle vector operations. Really what I'd want, which Marqo does, is handle everything end to end.

New comment by jeadie in "After All Is Said and Indexed – Unlocking Information in Recorded Speech"

jeadie — Sat, 15 Apr 2023 23:40:28 +0000

Not a dumb question at all! Essentially what can do Marqo, and this blog shows, is that there is alot of logic and work to do what you said (i.e. pass raw data into LLM, get embeddings, store in vector DB, then query both embeddings and original data).

New comment by jeadie in "After All Is Said and Indexed – Unlocking Information in Recorded Speech"

jeadie — Sat, 15 Apr 2023 23:32:58 +0000

Its a great tool. Unlike vectorDBs alone, Marqo helps the full process that alot of people end up wanting to use vectorDBs for (e.g. have structured data, use LLMs to create embeddings, and perform search/CRUD on embeddings + original data).