Hacker News: abhgh

New comment by abhgh in "Qwen3.5 Fine-Tuning Guide"

abhgh — Wed, 04 Mar 2026 15:52:39 +0000

They are great for specialized use-cases: (a) where the problem is not hard enough (you don't need reasoning), or (b) diverse enough (you don't need a world model), (c) you want cheap inference (and you can make it happen hardware-wise) and (d) you either have enough data or a workflow that accumulates data (with fine tuning with enough data you can sometimes beat a premier model while ensuring low latency - ofc, assuming (a) and (b) apply).

I make it sound like a rare perfect storm needs to exist to justify fine tuning, but these circumstances are not uncommon - to an extent (a), (c) and (d) were already prerequisites for deploying traditional ML systems.

New comment by abhgh in "Ask HN: What are you working on? (February 2026)"

abhgh — Mon, 09 Feb 2026 12:22:42 +0000

I notice you mentioned dspy - do you also support prompt optimization?

New comment by abhgh in "I miss thinking hard"

abhgh — Wed, 04 Feb 2026 07:13:44 +0000

This is an amazing quote - thank you. This is also my argument for why I can't use LLMs for writing (proofreading is OK) - what I write is not produced as a side-effect of thinking through a problem, writing is how I think through a problem.

The Gumbel-Max Trick

abhgh — Tue, 03 Feb 2026 17:32:03 +0000

Article URL: https://blog.quipu-strands.com/gumbel

Comments URL: https://news.ycombinator.com/item?id=46874077

Points: 2

# Comments: 0

New comment by abhgh in "Ask HN: When has a "dumb" solution beaten a sophisticated one for you?"

abhgh — Sun, 18 Jan 2026 07:37:56 +0000

I once modeled user journeys on a website using fancy ML models that honored sequence information, i.e., order of page visits, only to be beaten by bag-of-words (i.e., page url becomes a vector dimension, but order is lost) decision tree model, which was supposed to be my baseline.

What I had overlooked was that journeys on that particular website were fairly constrained by design, i.e., if you landed on the home page, did a bunch of stuff, put product X in the cart - there was pretty much one sequence of pages (or in the worst case, a small handful) that you'd traverse for the journey. Which means the bag-of-words (BoW) representation was more or less as expressive as the sequence model; certain pages showing up in the BoW vector corresponded to a single sequence (mostly). But the DT could learn faster with less data.

New comment by abhgh in "Vibe Coding Killed Cursor"

abhgh — Fri, 02 Jan 2026 17:12:52 +0000

I use Claude Code within Pycharm and I see the git diff format for changes there.

EDIT: It shows the side-by-side view by default, but it is easy to toggle to a unified view. There's probably a way to permanently set this somewhere.

New comment by abhgh in "A linear-time alternative for Dimensionality Reduction and fast visualisation"

abhgh — Tue, 16 Dec 2025 13:12:05 +0000

Thank you. Your comment about LLMs to semantically parse diverse data, as a first step, makes sense. In fact come to think of it, in the area of prompt optimization too - such as MIPROv2 [1] - the LLM is used to create initial prompt guesses based on its understanding of data. And I agree that UMAP still works well out of the box and has been pretty much like this since its introduction.

[1] Section C.1 in the Appendix here https://arxiv.org/pdf/2406.11695

New comment by abhgh in "A linear-time alternative for Dimensionality Reduction and fast visualisation"

abhgh — Tue, 16 Dec 2025 10:47:21 +0000

I was not aware this existed and it looks cool! I am definitely going to take out some time to explore it further.

I have a couple of questions for now: (1) I am confused by your last sentence. It seems you're saying embeddings are a substitute for clustering. My understanding is that you usually apply a clustering algorithm over embeddings - good embeddings just ensure that the grouping produced by the clustering algo "makes sense".

(2) Have you tried PaCMAP? I found it to produce high quality and quick results when I tried it. Haven't tried it in a while though - and I vaguely remember that it won't install properly on my machine (a Mac) the last time I had reached out for it. Their group has some new stuff coming out too (on the linked page).

[1] https://github.com/YingfanWang/PaCMAP

New comment by abhgh in "Algorithms for Optimization [pdf]"

abhgh — Mon, 01 Dec 2025 07:11:39 +0000

Thanks for the example. Yes, true, this is for expensive functions - to be precise functions that depend on data that is hard to gather, so you interleave the process of computing the value of the function with gathering strategically just as much data as is needed to compute the function value. The video on their page [1] is quite illustrative: calculate shortest path on a graph where the edge weights are expensive to obtain. Note how the edge weights they end up obtaining forms a narrow band around the shortest path they find.

[1] https://willieneis.github.io/bax-website/

New comment by abhgh in "Algorithms for Optimization [pdf]"

abhgh — Mon, 01 Dec 2025 05:40:42 +0000

Timefold looks very interesting. This might be irrelevant but have you looked at stuff like InfoBax [1]?

[1] https://willieneis.github.io/bax-website/

New comment by abhgh in "Terence Tao: At the Erdos problem website, AI assistance now becoming routine"

abhgh — Tue, 25 Nov 2025 17:35:19 +0000

You don't - the way I use LLMs for explanations is that I keep going back and forth between the LLM explanation and Google search /Wikipedia. And of course asking the LLM to cite sources helps.

This might sound cumbersome but without the LLM I wouldn't have (1) known what to search for, in a way (2) that lets me incrementally build a mental model. So it's a net win for me. The only gap I see is coverage/recall: when asked for different techniques to accomplish something, the LLM might miss some techniques - and what is missed depends upon the specific LLM. My solution here is asking multiple LLMs and going back to Google search.

New comment by abhgh in "Awk Technical Notes (2023)"

abhgh — Fri, 14 Nov 2025 20:45:51 +0000

Love awk. In the early days of my career, I used to write ETL pipelines and awk helped me condense a lot of stuff into a small number of LOC. I particularly prided myself in writing terse one-liners (some probably undecipherable, ha!); but did occasionally write scripts. Now I mostly reach for Python.

UMAP Projections of Animals to 2D

abhgh — Fri, 14 Nov 2025 00:57:33 +0000

Article URL: https://duhaime.s3.amazonaws.com/apps/umap-zoo/index.html

Comments URL: https://news.ycombinator.com/item?id=45922649

Points: 1

# Comments: 0

New comment by abhgh in "Claude Haiku 4.5"

abhgh — Thu, 16 Oct 2025 04:20:48 +0000

I'm curious to know if Anthropic mentions anywhere that they use speculative decoding. For OpenAI they do seem to use it based on this tweet [1].

[1] https://x.com/stevendcoffey/status/1853582548225683814

New comment by abhgh in "Let's Take Esoteric Programming Languages Seriously"

abhgh — Sun, 12 Oct 2025 08:06:09 +0000

Wouldn't this be an optimization problem, that's to say, something like z3 should be able to do - [1], [2]?

I was about to suggest probabilistic programming, e.g., PyMC [3], as well, but it looks like you want the optimization to occur autonomously after you've specified the problem - which is different from the program drawing insights from organically accumulated data.

[1] https://github.com/Z3Prover/z3?tab=readme-ov-file

[2] https://microsoft.github.io/z3guide/programming/Z3%20Python%...

[3] https://www.pymc.io/welcome.html

New comment by abhgh in "Show HN: Traceroute Visualizer"

abhgh — Fri, 03 Oct 2025 20:44:21 +0000

Hadn't seen this before, very nice read, thank you!

New comment by abhgh in "Gaussian Processes for Machine Learning (2006) [pdf]"

abhgh — Thu, 21 Aug 2025 02:01:33 +0000

Thank you for your kind comment!

New comment by abhgh in "Gaussian Processes for Machine Learning (2006) [pdf]"

abhgh — Mon, 18 Aug 2025 23:41:18 +0000

Aside from secondmind [1] I don't know of any companies (only because I haven't looked)... But if I had to look for places with strong research culture on GPs (I don't know if you're) I would find relevant papers on arxiv and Google scholar, and see if any of them come from industry labs. If I had to take a guess on Bayesian tools at work, maybe the industries to look at would be advertising and healthcare.I would also look out for places that hire econometricists.

Also thank you for the book recommendation!

[1] https://www.secondmind.ai/

New comment by abhgh in "Gaussian Processes for Machine Learning (2006) [pdf]"

abhgh — Mon, 18 Aug 2025 20:23:25 +0000

This is the definitive reference on the topic! I have some notes on the topic as well, if you want something concise, but that doesn't ignore the math [1].

[1] https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs#gaus...

New comment by abhgh in "Achieving 10,000x training data reduction with high-fidelity labels"

abhgh — Fri, 08 Aug 2025 06:08:56 +0000

Active Learning is a very tricky area to get right ... over the years I have had mixed luck with text classification, to the point that my colleague and I decided to perform a thorough empirical study [1], that normalized various experiment settings that individual papers had reported. We observed that post normalization, randomly picking instances to label is better!

[1] https://aclanthology.org/2024.emnlp-main.1240/