Hacker News: jasim

New comment by jasim in "Lotus 1-2-3 on the PC with DOS"

jasim — Tue, 10 Mar 2026 09:30:41 +0000

Everytime someone mentions Clipper (or dBase or FoxPro, or even FoxBase, but Clipper the most), I feel a sene of productive nostalgia, and a constructive anger at the state of technology today. xBase was a beautiful thing - I haven't had as much fun at building software that I've had from the first plink86 till CA-Clipper 5.3's blinker and exospace. Even prolific use of Opus 4.6 doesn't bring the sense of quality and satisfaction that those systems produced.

I'm building a new database tool for the web, a frankenstein of Lotus 1-2-3, dBase, MS-Access, and Claude Code. It is where that anger goes these days.

A.I. Did What Super-Specialist Doctors Could Not – Claude Opus 4.6

jasim — Fri, 20 Feb 2026 16:11:26 +0000

Article URL: https://www.youtube.com/watch?v=2im_MbqxYcU

Comments URL: https://news.ycombinator.com/item?id=47089879

Points: 1

# Comments: 0

Cache-Friendly B+Tree Nodes with Dynamic Fanout

jasim — Tue, 07 Oct 2025 16:39:13 +0000

Article URL: https://jacobsherin.com/posts/2025-08-18-bplustree-struct-hack/

Comments URL: https://news.ycombinator.com/item?id=45505398

Points: 99

# Comments: 30

A B+ tree node underflows: merge or borrow?

jasim — Tue, 30 Sep 2025 16:41:25 +0000

Article URL: https://jacobsherin.com/posts/2025-08-16-bplustree-compare-borrow-merge/

Comments URL: https://news.ycombinator.com/item?id=45427774

Points: 43

# Comments: 0

New comment by jasim in "Embedding user-defined indexes in Apache Parquet"

jasim — Tue, 15 Jul 2025 06:07:22 +0000

I'm not sure if this is what you're looking for, but there is a proposal in DataFusion to allow user defined types. https://github.com/apache/datafusion/issues/12644

New comment by jasim in "Embedding user-defined indexes in Apache Parquet"

jasim — Mon, 14 Jul 2025 19:42:47 +0000

Parquet files include a field called key_value_metadata in the FileMetadata structure; it sits in the footer of the file. See: https://github.com/apache/parquet-format/blob/master/src/mai...

The technique described in the article, seems to use this key-value pair to store pointers to the additional metadata (in this case a distinct index) embedded in the file. Note that we can embed arbitrary binary data in the Parquet file between each data page. This is perfectly valid since all Parquet readers rely on the exact offsets to the data pages specified in the footer.

This means that DataFusion does not need to specify how the metadata is interpreted. It is already well specified as part of the Parquet file format itself. DataFusion is an independent project -- it is a query execution engine for OLAP / columnar data, which can take in SQL statements, build query plan, optimize them, and execute. It is an embeddable runtime with numerous ways to extend it by the host program. Parquet is a file format supported by DataFusion because it is one of the most popular ways of storing data in a columnar way in object storages like S3.

Note that the readers of Parquet need to be aware of any metadata to exploit it. But if not, nothing changes - as long as we're embedding only supplementary information like indices or bloom filters, a reader can still continue working with the columnar data in Parquet as it used to; it is just that it won't be able to take advantage of the additional metadata.

New comment by jasim in "Embedding user-defined indexes in Apache Parquet"

jasim — Mon, 14 Jul 2025 17:33:12 +0000

I think this post is a response to some new file format initiatives, based on the criticism that the Parquet file format is showing its age.

One of the arguments is that there is no standardized way to extend Parquet with new kinds of metadata (like statistical summaries, HyperLogLog etc.)

This post was written by the DataFusion folks, who have shown a clever way to do this without breaking backward compatibility with existing readers.

They have inserted arbitrary data between footer and data pages, which other readers will ignore. But query engines like DataFusion can exploit it. They embed a new index to the .parquet file, and use that to improve query performance.

In this specific instance, they add an index with all the distinct values of a column. Then they extend the DataFusion query engine to exploit that so that queries like `WHERE nation = 'Singapore'` can use that index to figure out whether the value exists in that .parquet file without having to scan the data pages (which is already optimized because there is a min-max filter to avoid scanning the entire dataset).

Also in general this is a really good deep dive into columnar data storage.

New comment by jasim in "The upcoming GPT-3 moment for RL"

jasim — Mon, 14 Jul 2025 16:37:57 +0000

Well put. It should always be "Created by " rather than "Processed by LLM". We can already see it with Claude Code - its commit messages contain a "Generated by Claude Code" line, and it guarantees a pandemic of diffused responsibility in software engineering. But I think there is no point in railing against it - market forces, corporate incentives, and tragedy of the commons all together make it an inevitability.

Embedding user-defined indexes in Apache Parquet

jasim — Mon, 14 Jul 2025 16:29:02 +0000

Article URL: https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/

Comments URL: https://news.ycombinator.com/item?id=44562036

Points: 144

# Comments: 24

New comment by jasim in "The upcoming GPT-3 moment for RL"

jasim — Mon, 14 Jul 2025 11:35:36 +0000

I'm sorry I don't follow. The fact that you use an LLM to classify a transaction does not mean there is no audit trail for the fact. There should also be a manual verifier who's ultimately responsible for the entries, so that we do not abdicate responsibility to black boxes.

New comment by jasim in "The upcoming GPT-3 moment for RL"

jasim — Mon, 14 Jul 2025 08:45:58 +0000

Accounting, specifically book-keeping, really plays to the strengths of LLMs - pattern matching within a bounded context.

The primary task in book-keeping is to classify transactions (from expense vouchers, bank transactions, sales and purchase invoices and so on) and slot them into the Chart of Accounts of the business.

LLMs can already do this well without any domain/business specific context. For example - a fuel entry is so obvious that they can match it into a similar sounding account in the CoA.

And for others where human discretion is required, we can add a line of instruction in the prompt, and that classification is permanently encoded. A large chunk of these kind of entries are repetitive in nature, and so each such custom instruction is a long-term automation.

You might have not been speaking about simple book-keeping. If so, I'm curious to learn.

New comment by jasim in "Ask HN: Are we bike-shedding with prompt engineering?"

jasim — Wed, 11 Jun 2025 08:20:04 +0000

I can think of two instances, where the LLM embracing best practices for human thought leads to better results.

Claude Code breaks down large implementations to simpler TODOs, and produces far better code than single-shot prompts. There is something about problem decomposition that works well no matter whether it is in mathematics, LLMs, or software engineers.

The decomposition also shows a split between planning and execution. Doing them separately somehow provides the LLM more cognitive space to think.

Another example is CHASE-SQL. This is one of the top approaches in Text-to-SQL benchmark in bird-bench. They take a human textual data requirement, and instead of directly asking the LLM to generate a SQL query, they run it through multiple passes: generating portions of the requirement as pseudo-SQL fragments using independent LLM calls, combining them, then using a separate ranking agent to find the best one. Additional agents like a fixer to fix invalid SQL are also used.

What could've been done with a single direct LLM query is instead broken down into multiple stages. What was implicit (find the best query) is made explicit. And from how well it performs, it is clear that articulating fuzzy thoughts and requirements into explicit smaller clearer steps works as well for LLMs as it does for humans.

Flat, Stale, and Profitable: Thoughts on Light Roasted Coffee

jasim — Fri, 23 May 2025 07:35:06 +0000

Article URL: https://www.home-barista.com/coffees/flat-stale-and-very-profitable-thoughts-on-light-roasted-coffee-t89299.html

Comments URL: https://news.ycombinator.com/item?id=44070739

Points: 1

# Comments: 0

Teaching Methods: Chaining, Shaping, Chunking [pdf]

jasim — Sun, 27 Apr 2025 07:35:28 +0000

Article URL: https://softball.ca/_uploads/5c379f7b57334.pdf

Comments URL: https://news.ycombinator.com/item?id=43810150

Points: 2

# Comments: 0

How to Build and Scale Onboarding

jasim — Wed, 23 Apr 2025 20:08:51 +0000

Article URL: https://review.firstround.com/superhuman-onboarding-playbook/

Comments URL: https://news.ycombinator.com/item?id=43776150

Points: 1

# Comments: 0

Dear Apple and Google: still no app rollbacks?

jasim — Wed, 19 Mar 2025 17:46:03 +0000

Article URL: https://www.tramline.app/blog/dear-apple-and-google-still-no-app-rollbacks

Comments URL: https://news.ycombinator.com/item?id=43415135

Points: 7

# Comments: 0

Draftflow: A Collaborative CRDT-Aware Editor AI

jasim — Tue, 04 Feb 2025 18:03:03 +0000

Article URL: https://vishnugopal.com/2025/02/04/draftflow-a-collaborative-crdt-aware-editor-ai/

Comments URL: https://news.ycombinator.com/item?id=42936181

Points: 1

# Comments: 0

The Rise and Fall of Ashton-Tate (2023)

jasim — Thu, 12 Dec 2024 18:32:36 +0000

Article URL: https://www.abortretry.fail/p/the-rise-and-fall-of-ashton-tate

Comments URL: https://news.ycombinator.com/item?id=42401896

Points: 105

# Comments: 76

New comment by jasim in "Engineers do not get to make startup mistakes when they build ledgers"

jasim — Fri, 29 Nov 2024 10:48:02 +0000

With negative/positive, the invariant would be sum(amount) = 0; with your approach, it would be sum(debit-credit)=0. Both are valid, it is just two ways of expressing the same thing.

I think it is useful to think about double-entry book-keeping in two layers. One is the base primitive of the journal - where each transaction has a set of debits and credits to different accounts, which all total to 0.

Then above that there is the chart of accounts, and how real-world transactions are modelled. For an engineer, to build the base primitive, we only need a simple schema for accounts and transactions. You can use either amount (+/-ve), or debit/credit for each line item.

Then if you're building the application layer which creates entries, like your top-up example, then you also need to know how to _structure_ those entries. If you have a transfer between two customer accounts, then you debit the one who's receiving the money (because assets are marked on the debit side) and credit the other (because liabilities are on the credit side). If you receive payment, then cash is debited (due to assets), and the income account is credited (because income balances are on the credit side).

However, all of this has nothing to do with how we structure the fundamental primitive of the journalling system. It is just a list of accounts, and then a list of transactions, where each transaction has a set of accounts that get either debited/credit, with the sum of the entire transaction coming to 0. That's it -- that constraint is all there is to double-entry book-keeping from a schema point.

New comment by jasim in "Engineers do not get to make startup mistakes when they build ledgers"

jasim — Fri, 29 Nov 2024 08:24:34 +0000

> Instead of using negative numbers, Accounts have normal balance: normal credit balance literally means that they are normal when its associated entries with type credit have a total amount that outweighs its associated entries with type debit. The reverse is true for normal debit balance.

But that is an interpretation made by the viewer. A customer typically is an asset account, whose balances are in the debit column. But if we somehow owe them money because let's say they paid us an advance, then their balance should be in the credit column. The accounting system need not bother with what the "right" place for each account is.

It is quite practical to have only a simple amount column rather than separate debit/credit columns in a database for journal entries. As long as we follow a consistent pattern in mapping user input (debit = positive, credit = negative) into the underlying tables, and the same when rendering accounting statements back, it would remain consistent and correct.