Hacker News: svcrunch

New comment by svcrunch in "GLM-5.1: Towards Long-Horizon Tasks"

svcrunch — Wed, 08 Apr 2026 03:06:03 +0000

The grandparent is definitely wrong on (3). Yes, coding is a killer product, I agree with you.

On (2), I agree with you for local models. BUT, there are also the open source Chinese models accessible via open-router. Your argument ("don't hold a candle to SOTA models") does not hold if the comparison is between those.

On (1), I agree more with the grandparent than with your assessment. Yes, OpenAI and Anthropic are killing it for now, but the time horizon is very short. I use codex and claude daily, but it's also clear to me that open source is catching up quickly, both w.r.t. the models and the agentic harnesses.

New comment by svcrunch in "Ask HN: Do AI startups even bother with patents anymore?"

svcrunch — Fri, 06 Mar 2026 06:09:19 +0000

I generally don't waste time with patents. I think most patents in deep learning can be overturned by prior art.

My current approach to IP is trade secrets. If we publish, we are careful to avoid details that would make the techniques easy to productionize.

GPT-5.4 Scores 0.62 F1 on Understanding Handwritten Edits in Dickens

svcrunch — Fri, 06 Mar 2026 05:52:44 +0000

Article URL: https://dorrit.pairsys.ai/

Comments URL: https://news.ycombinator.com/item?id=47271374

Points: 2

# Comments: 0

New comment by svcrunch in "Tuning Semantic Search on JFMM.net – Joint Fleet Maintenance Manual"

svcrunch — Sun, 01 Feb 2026 22:47:09 +0000

Thanks for your interest. The rerankers are external, GoodMem is a unified API layer that calls out to various providers. There's no model running inside the database or the GoodMem server.

We support both commercial APIs and self-hosted options:

  - Cohere (rerank-english-v3.0, etc.)
  - Voyage AI (rerank-2.5)
  - Jina AI (jina-reranker-v3)

Self-hosted (no API key needed):

  - TEI - https://github.com/huggingface/text-embeddings-inference
  - vLLM - https://docs.vllm.ai/en/v0.8.1/serving/openai_compatible_server.html#rerank-api

You register a reranker once with the CLI:

  # Cohere
  goodmem reranker create \
    --display-name "Cohere" \
    --provider-type COHERE \
    --endpoint-url "https://api.cohere.com" \
    --model-identifier "rerank-english-v3.0" \
    --cred-api-key "YOUR_API_KEY"

  # Self-hosted TEI (e.g., BAAI/bge-reranker-v2-m3)
  goodmem reranker create \
    --display-name "TEI Local" \
    --provider-type TEI \
    --endpoint-url "http://localhost:8081" \
    --model-identifier "BAAI/bge-reranker-v2-m3"

Then you can experiment interactively through the TUI.

  goodmem memory retrieve \
    --space-id  \
    --post-processor-interactive \
    "your query"

For your setup, I think TEI is probably the path of least resistance, it has first-class reranker support and runs well on CPU.

New comment by svcrunch in "Tuning Semantic Search on JFMM.net – Joint Fleet Maintenance Manual"

svcrunch — Sun, 01 Feb 2026 16:01:42 +0000

Hi there, thanks for writing and sharing your experiences. I'm one of the builders of GoodMem (https://goodmem.ai/), which is infra to simplify end-to-end RAG/agentic memory systems like the one you built.

It's built on Postgres, which I know you said you left behind, but one of the cool features it supports is hybrid search over multiple vector representations of a passage, so you can do a dense (e.g. nomic) and sparse (e.g. splade) search. Reranking is also built in, although it lacks automatic caching (since, in general, the corpus changes over time)

It also deploys to fly.io/railway and costs a few bucks a month to run if you're willing to use cloud-hosted embedding models (otherwise, you can run TEI/vLLM on CPU or GPU for the setup you described).

I hope it's helpful to someone.

New comment by svcrunch in "Ask HN: Share your AI prompt that stumps every model"

svcrunch — Thu, 24 Apr 2025 23:11:59 +0000

Here's a problem that no frontier model does well on (f1 < 0.2), but which I think is relatively easy for most humans:

https://dorrit.pairsys.ai/

> This benchmark evaluates the ability of multimodal language models to interpret handwritten editorial corrections in printed text. Using annotated scans from Charles Dickens' "Little Dorrit," we challenge models to accurately capture human editing intentions.

New comment by svcrunch in "I made starter kit with billing, admin and AI"

svcrunch — Wed, 23 Apr 2025 17:58:50 +0000

This is really cool.

New comment by svcrunch in "Can GPT-4o Accurately Read Handwritten Proofreading Marks?"

svcrunch — Thu, 10 Apr 2025 06:05:46 +0000

Various frontier LLMs were evaluated on their ability to interpret handwritten proofreading marks in printed literary text, using a small benchmark based on Charles Dickens's "Little Dorrit". Results are modest at best, and surprisingly variable across repeated runs, even on the same pages, underscoring the challenge in building reliable, structured-document systems with current multimodal LLMs.

Curious to hear thoughts from others working on similar problems.

Can GPT-4o Accurately Read Handwritten Proofreading Marks?

svcrunch — Thu, 10 Apr 2025 06:05:46 +0000

Article URL: https://dorrit.pairsys.ai/

Comments URL: https://news.ycombinator.com/item?id=43641115

Points: 1

# Comments: 2

New comment by svcrunch in ""Attention", "Transformers", in Neural Network "Large Language Models""

svcrunch — Mon, 25 Dec 2023 00:54:32 +0000

No.

But to your point, note that in 2020 neuroscientists introduced the Tolman-Eichenbaum Machine (TEM) [1], a mathematical model of the hippocampus that bears a striking resemblance to transformer architecture.

Artem Kirsanov has a very nice piece on TEM, "Can we Build an Artificial Hippocampus?" [2] The link is directly to the spot where he makes the connection to transformers, although you should watch the whole video for context.

Because I wasn't clear on the chronology, I went back and asked one of the "Attention" authors whether mathematical models of the hippocampus inspired their paper? His answer was "no". If TEM was developed without pre-knowledge of transformers, then it's a very deep result IMHO.

[1] https://www.sciencedirect.com/science/article/pii/S009286742...

[2] https://www.youtube.com/watch?v=cufOEzoVMVA&t=1254s

New comment by svcrunch in ""Attention", "Transformers", in Neural Network "Large Language Models""

svcrunch — Mon, 25 Dec 2023 00:26:10 +0000

While in Google Research, I worked with two of the authors of the "Attention is All you Need" paper, including the gentleman who chose that title.

As others have pointed out, self-attention was already a known concept in the research community. They don't claim to have invented that. Rather, the authors began by looking at how to improve the power of feed-forward neural networks using a combination of techniques, obtained some exciting results, and then, in the course of ablation studies, discovered that attention was really all you needed!

The title is a play on the Beatles song, "All You Need Is Love".

In terms of expository style, the paper that was most helpful for me was [Formal Algorithms for Transformers](https://arxiv.org/abs/2207.09238) by Phuong and Hutter. Written for clarity and with an emphasis on precision, the motivation section (Section 2) of the paper does a great job of explaining deficiencies in the original paper and subsequent ones.

New comment by svcrunch in "What Is Retrieval-Augmented Generation a.k.a. RAG?"

svcrunch — Fri, 01 Dec 2023 22:52:53 +0000

Take a look at the BEIR benchmark, which has served as one of the main drivers for development of neural IR systems since its introduction in 2020.

BM25 presents a challenging cross-domain benchmark, and it wasn't till ~2022 that neural methods overtook it. If memory serves, it was the sparse neural methods like Splade, although recent dense models can also beat it.

The caveat is that BEIR is suffering from overfitting at this point.

New comment by svcrunch in "What Is Retrieval-Augmented Generation a.k.a. RAG?"

svcrunch — Fri, 01 Dec 2023 22:49:55 +0000

> but HNSW is the best 99% of the time for both performance and latency, and is implemented in almost every modern major vector store.

In my experience, HNSW indexes are very expensive to build, relative to indexes like IVF. They also have a larger memory footprint. IVF, on the other hand, is pretty trivial to parallelize across multiple machines, and while I'm aware there are techniques for doing that with HNSW, I don't know the details well enough.

Also, if you review papers like "SOAR: Improved Quantization for Approximate Nearest Neighbor Search", they hint at some of the throughput barriers faced by graph-based methods like HNSW.

New comment by svcrunch in "What Is Retrieval-Augmented Generation a.k.a. RAG?"

svcrunch — Fri, 01 Dec 2023 22:43:35 +0000

I think your comment is accurate, but regarding your last point:

"However, fine-tuning on relevant, high quality, knowledge-rich question/answer pairs seems dominant, when such examples are available or can be generated."

How does one solve the problem of access-controlled data, if not through RAG? Do you imagine a separate version of the LLM for every user, reflecting their unique permissions on the data?

Also, in scenarios where the data is being updated regularly, RAG provides much lower latency to the new information. Deletes also present a challenge for a pure-LLM approach.

New comment by svcrunch in "[dead]"

svcrunch — Mon, 06 Nov 2023 17:12:35 +0000

While transformer-based AI is very powerful, and its potential uses in the business world nearly limitless, the issue of hallucination is holding back adoption. Here, Simon Hughes of Vectara introduces an open-source model, HEM, that can automatically detect hallucinations with fairly low latency.

Disclosure: I am one of the founders of Vectara and head research there.

New comment by svcrunch in "Show HN: Boomerang, a new embedding model for RAG and semantic search"

svcrunch — Tue, 26 Sep 2023 17:09:14 +0000

The metrics presented in the blog post are those of our production model. When designing Boomerang, we tried to balance latency and search relevance in a manner that strikes the right balance for most use cases.

On the other hand, GTR-XXL is an example of a research model that biases in favor of search relevance, at the expense of latency. It's not really practical to deploy in production environments as a result.

New comment by svcrunch in "Bob and Juliet? Rag vs. Finetuning an LLM"

svcrunch — Fri, 08 Sep 2023 15:45:31 +0000

I believe that retrieval-augmented generation is the right path to generative AI within organizations, at least for the next few years. Trying to directly fine-tune an LLM on your data also runs into issues with enforcing access permissions.

However, instead of simply being a post-processing step at the end of an IR pipeline, LLMs will eventually sandwhich the IR system, along the lines of the [Demonstrate, Search, Predict framework](https://arxiv.org/abs/2212.14024) by Khattab et al.

New comment by svcrunch in "USearch: Smaller and faster single-file vector search engine"

svcrunch — Wed, 02 Aug 2023 01:34:21 +0000

I'm curious, is HSNW the only option? Do you support IVF-style indexes? Also, FAISS is nice because it supports a pluggable storage layer. Is this something that's easily supported in USearch?

Great work, and thank you for your contributions.

New comment by svcrunch in "The Charles Dickens Illustrated Gallery"

svcrunch — Thu, 22 Jun 2023 06:06:51 +0000

Thank you for this!

I know that Cruikshank was the original illustrator of many of Dickens's novels, but I prefer the artwork of James Mahoney. As a point of comparison, the same scene by both artists:

1. "Oliver Rather Astonishes Noah", https://imgur.com/a/DWeblXT, as illustrated by James Mahoney.

2. "Oliver Plucks up a Spirit", https://www.charlesdickensillustration.org/oliver-twist?pgid..., as illustrated by George Cruikshank.

I scanned and vectorized all the artwork for "Oliver Twist" and typeset it at http://ahmadsoft.org/downloads/Oliver%20Twist,%20or,%20The%2... (warning, it's a large PDF due to the high resolution vector imagery), but then I started a company in 2020, and haven't had time to finish "Little Dorit", which, compared to "Oliver Twist", I was able to scan at a much higher resolution and get better quality.

New comment by svcrunch in "[dead]"

svcrunch — Thu, 06 Apr 2023 21:29:58 +0000

This guide reviews 13 prominent instruction-following LLMs given constraints like commercial or non-commercial usage, and self-hosted versus API access.