Hacker News: michaelmarkell

New comment by michaelmarkell in "The wealth of the top 1% reaches a record $52T (2025)"

michaelmarkell — Fri, 16 Jan 2026 19:42:07 +0000

You're 100% right, thanks! Updated my comment.

New comment by michaelmarkell in "The wealth of the top 1% reaches a record $52T (2025)"

michaelmarkell — Fri, 16 Jan 2026 19:36:37 +0000

That's approximately ~15.3M USD per person in the top 1% assuming 340 million americans

New comment by michaelmarkell in "Claude for Excel"

michaelmarkell — Mon, 27 Oct 2025 20:33:09 +0000

Not really. Take for example:

item, date, price

abc, 01/01/2023, $30

cde, 02/01/2023, $40

... 100k rows ...

subtotal. $1000

def, 03/01,2023, $20

"Hey Claude, what's the total from this file? > grep for headers > "Ah, I see column 3 is the price value" > SUM(C2:C) -> $2020 > "Great! I found your total!"

If you can find me an example of tech that can solve this at scale on large, diverse Excel formats, then I'll concede, but I haven't found something actually trustworthy for important data sets

New comment by michaelmarkell in "Claude for Excel"

michaelmarkell — Mon, 27 Oct 2025 18:00:24 +0000

IMO, a real solution here has to be hybrid, not full LLM, because these sheets can be massive and have very complicated structures. You want to be able to use the LLM to identify / map column headers, while using non-LLM tool calling to run Excel operations like SUMIFs or VLOOKUPs. One of the most important traits in these systems is consistency with slight variation in file layout, as so much Excel work involves consolidating / reconciling between reports made on a quarterly basis or produced by a variety of sources, with different reporting structures.

Disclosure: My company builds ingestion pipelines for large multi-tab Excel files, PDFs, and CSVs.

New comment by michaelmarkell in "Chess grandmaster Daniel Naroditsky has died"

michaelmarkell — Mon, 20 Oct 2025 21:12:54 +0000

Danya was like the Mr. Rogers of chess. He had a way of making you feel accepted into the chess community even if you were a beginner, and was such a clear thinker. I strive to be more like him, and am devastated by this loss.

Fast UPDATEs for the ClickHouse column store

michaelmarkell — Mon, 25 Aug 2025 19:58:30 +0000

Article URL: https://clickhouse.com/blog/updates-in-clickhouse-2-sql-style-updates

Comments URL: https://news.ycombinator.com/item?id=45018275

Points: 4

# Comments: 0

New comment by michaelmarkell in "Does OLAP Need an ORM"

michaelmarkell — Sun, 17 Aug 2025 17:35:06 +0000

The way my company uses Clickhouse is basically that we have one giant flat table, and have written our own abstraction layer on top of it based around "entities" which are functions of data in the underlying table, potentially adding in some window functions or joins. Pretty much every query we write with Clickhouse tacks on a big "Group By All" at the end of it, because we are always trying to squash down the number of rows and aggregate as aggressively as possible.

I imagine we're not alone in this type of abstraction layer, and some type-safety would be very welcome there. I tried to build our system on top of Kysely (https://kysely.dev/) but the Clickhouse extension was not far along enough to make sense for our use-case. As such, we basically had to build our own parser that compiles down to sql, but there are many type-error edge cases, especially when we're joining in against data from S3 that could be CSV, Parquet, etc.

Side note: One of the things I love most about Clickhouse is how easy it is to combine data from multiple sources other than just the source database at query time. I imagine this makes the problem of building an ORM much harder as well, since you could need to build type-checking / ORM against sql queries to external databases, rather than to the source table itself

A New Postgres Block Storage Layout for Full Text Search

michaelmarkell — Sun, 06 Jul 2025 21:08:54 +0000

Article URL: https://www.paradedb.com/blog/block_storage_part_one

Comments URL: https://news.ycombinator.com/item?id=44484066

Points: 21

# Comments: 2

New comment by michaelmarkell in "Anthropic's Circuit Tracer"

michaelmarkell — Sat, 31 May 2025 15:09:39 +0000

From the Readme:

Given a model with pre-trained transcoders, it finds the circuit / attribution graph; i.e., it computes the direct effect that each non-zero transcoder feature, transcoder error node, and input token has on each other non-zero transcoder feature and output logit. Given an attribution graph, it visualizes this graph and allows you to annotate these features. Enables interventions on a model's transcoder features using the insights gained from the attribution graph; i.e. you can set features to arbitrary values, and observe how model output changes.

The blog post: https://www.anthropic.com/research/open-source-circuit-traci...

Anthropic's Circuit Tracer

michaelmarkell — Sat, 31 May 2025 15:09:39 +0000

Article URL: https://github.com/safety-research/circuit-tracer

Comments URL: https://news.ycombinator.com/item?id=44144779

Points: 2

# Comments: 1

New comment by michaelmarkell in "Ask HN: Who is hiring? (May 2025)"

michaelmarkell — Thu, 01 May 2025 23:33:52 +0000

Syncopate | NYC (Hybrid ~3d/week) | Full-time | Senior Full Stack Engineers / Focus on AI + Finance

Syncopate builds tools to help automate financial diligence and management of long-tail financial assets.

We've found product market fit with ETL/analysis tools for niche financial data, starting with music rights, and we're looking to build out our capabilities across more Excel + PDF-based workflows.

What we're looking for: A full-stack engineer with experience building data-heavy applications. Experience with analytics databases like Clickhouse and data pipelining is a plus. Required to be knowledgeable in Typescript.

Big bonus points for: 1) High agency (previously a founder or built side-projects to completion) 2) Some knowledge of finance 3) Skill in Rust

You can reach out to me here https://www.linkedin.com/in/michael-markell-377b4221a/ or via email (michael at syncopate dot ai)

More about Syncopate (geared towards our music rights segment): https://syncopate.notion.site/

New comment by michaelmarkell in "Show HN: Chonky – a neural approach for text semantic chunking"

michaelmarkell — Sun, 13 Apr 2025 14:43:09 +0000

In our use-case we have many gigabytes of PDFs that contain some qualitative data but also many pages of inline pdf tables. In an ideal world we’d be “compressing” those embedded tables into some text that says “there’s a table here with these columns, if you want to analyze it you can use this , but basically the table is talking about X, here are the relevant stats like mean, sum, cardinality.”

In the naive chunking approach, we would grab random sections of line items from these tables because they happen to reference some similar text to the search query, but there’s no guarantee the data pulled into context is complete.

New comment by michaelmarkell in "Show HN: Chonky – a neural approach for text semantic chunking"

michaelmarkell — Sun, 13 Apr 2025 13:37:09 +0000

It seems to me like chunking (or some higher order version of it like chunking into knowledge graphs) is the highest leverage thing someone can work on right now if trying to improve intelligence of AI systems like code completion, PDF understanding etc. I’m surprised more people aren’t working on this.

New comment by michaelmarkell in "A Man Out to Prove How Dumb AI Still Is"

michaelmarkell — Fri, 04 Apr 2025 22:53:20 +0000

If I were to guess, most (adult) humans could not add two 3 digit numbers together with 100% accuracy. Maybe 99%? Computers can already do 100%, so we should probably be trying to figure out how to use language to extract the numbers from stuff and send them off to computers to do the calculations. Especially because in the real world most numbers that matter are not just two digits addition

New comment by michaelmarkell in "OpenAI says it has evidence DeepSeek used its model to train competitor"

michaelmarkell — Thu, 30 Jan 2025 00:16:20 +0000

Can someone with more expertise help me understand what I'm looking at here? https://crt.sh/?id=10106356492

It looks like Deepseek had a subdomain called "openai-us1.deepseek.com". What is a legitimate use-case for hosting an openai proxy(?) on your subdomain like this?

Not implying anything's off here, but it's interesting to me that this OpenAI entity is one of the few subdomains they have on their site

New comment by michaelmarkell in "Dinner and Deception (2015)"

michaelmarkell — Thu, 12 Dec 2024 12:24:54 +0000

Archive link: https://web.archive.org/web/20240330143422/https://www.nytim...

Dinner and Deception (2015)

michaelmarkell — Thu, 12 Dec 2024 12:24:53 +0000

Article URL: https://www.nytimes.com/2015/08/23/opinion/sunday/dinner-and-deception.html

Comments URL: https://news.ycombinator.com/item?id=42398574

Points: 1

# Comments: 1

Parsebox | Parser Combinators in the TypeScript Type System

michaelmarkell — Thu, 07 Nov 2024 15:12:57 +0000

Article URL: https://github.com/sinclairzx81/parsebox

Comments URL: https://news.ycombinator.com/item?id=42077328

Points: 1

# Comments: 0

New comment by michaelmarkell in "What happens when you make a move in lichess.org?"

michaelmarkell — Wed, 23 Oct 2024 20:56:33 +0000

Timing of moves

New comment by michaelmarkell in "Man accused of using bots and AI to earn streaming revenue"

michaelmarkell — Sun, 08 Sep 2024 12:32:46 +0000

Artists do not get paid per stream on Spotify + many other DSPs. The platform sums up all of the ad revenue and divides it pro rata among all of the streamed artists. So the fraudulent streams dilute the pie for legitimate streams.