Hacker News: dinp

New comment by dinp in "EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages"

dinp — Fri, 20 Mar 2026 05:33:35 +0000

> If this benchmark becomes popular, then presumably to avoid such embarrassments synthetic data is eventually added to training sets to make sure even esolangs are somewhat more in-distro

https://x.com/lossfunk/status/2034637505916792886

"After the paper was finalized, we ran agentic systems that mimic how humans would learn to solve problems in esoteric languages. We supplied our agents with a custom harness + tools on the same benchmark. They absolutely crushed the benchmark. Stay tuned"

A little harness engineering was enough!

New comment by dinp in "An AI Agent Published a Hit Piece on Me – The Operator Came Forward"

dinp — Fri, 20 Feb 2026 03:40:35 +0000

Zooming out a little, all the ai companies invested a lot of resources into safety research and guardrails, but none of that prevented a "straightforward" misalignment. I'm not sure how to reconcile this, maybe we shouldn't be so confident in our predictions about the future? I see a lot of discourse along these lines:

- have bold, strong beliefs about how ai is going to evolve

- implicitly assume it's practically guaranteed

- discussions start with this baseline now

About slow take off, fast take off, agi, job loss, curing cancer... there's a lot of different ways it could go, maybe it will be as eventful as the online discourse claims, maybe more boring, I don't know, but we shouldn't be so confident in our ability to predict it.

New comment by dinp in "Claude Code is being dumbed down?"

dinp — Thu, 12 Feb 2026 00:24:21 +0000

I thought I was the only person going crazy by the new default behavior not showing the file names! Please don't expect users to understand your product details and config options in such detail, it was working well before, let it remain. Or at least show some message like "to view file names, do xyz" in the ui for a few days after such a change.

While we're here, another thing that's annoying: the token counter. While claude is working, it read some files, makes an edit, let's say token counter is at 2k tokens, I accept the edit, now it starts counting very fast from 0 to 2k and then shows normal inference speed changes to 2.1k, 2.3k etc. So wanted to confirm: is that just some UI decision and not actually using 2k tokens again? If so, it would be nice to have it off, just continue counting where you left off.

Another thing: is it possible to turn off the words like finagling and similar (I can't remember the spelling of any of them) ?

New comment by dinp in "O3 mini vs. Gemini flash 2.0 in chess"

dinp — Mon, 17 Feb 2025 08:06:48 +0000

Source code: https://github.com/don-dp/simulateagents/

Click on 'Play moves' to watch a replay.

I initially planned to run a chess tournament for LLMs but they are not good: besides obvious mistakes, they output incorrect moves, get stuck in loops by repeating the same moves and the smaller models fail to output valid json frequently. I thought the reasoning models like o3 mini might be good, but they are an incremental improvement in chess.

Feedback and suggestions for other games to explore welcome.

O3 mini vs. Gemini flash 2.0 in chess

dinp — Mon, 17 Feb 2025 08:06:48 +0000

Article URL: https://simulateagents.com/chess/3/

Comments URL: https://news.ycombinator.com/item?id=43076424

Points: 2

# Comments: 1

New comment by dinp in "Ugandan runner Jacob Kiplimo completes first ever sub-57 minute half marathon"

dinp — Mon, 17 Feb 2025 05:32:27 +0000

The article mentions, he is going to run the marathon, looking forward to what he can do in that distance. I feel it's only a matter of time until someone breaks the 2 hour barrier in an official race. Lot of people thought it would be Kelvin Kiptum, unfortunately he passed away in an accident.

New comment by dinp in "Apple Resumes Advertising on X"

dinp — Fri, 14 Feb 2025 02:16:59 +0000

I think political news are not encouraged here, exceptions for when interesting discussions are possible.

Judging by the quality of comments here and in the linked submissions, it's a good thing.

New comment by dinp in "I have made the decision to disband Hindenburg Research"

dinp — Thu, 16 Jan 2025 01:40:11 +0000

The impact this organization had was incredible. I doubt they would have been able to do this work if they were based out of any other country, which makes me wonder how the US legal system, regulators and law enforcement in general are not extremely corrupt. What reasons or incentives make the system work in the US? Of course there are many instances of corruption and injustice, but in comparison to almost any other country, it seems to work surprisingly well.

New comment by dinp in "Open source inference time compute example from HuggingFace"

dinp — Fri, 20 Dec 2024 14:08:18 +0000

Great work! When I use models like o1, they work better than sonnet and 4o for tasks that require some thinking but the output is often very verbose. Is it possible to get the best of both worlds? The thinking takes place resulting in better performance but the output is straightforward to work with like with sonnet and 4o. Did you observe similar behaviour with the 1B and 3B models? How does the model behaviour change when used for normal tasks that don't require thinking?

Also how well do these models work to extract structured output? Eg- perform ocr on some hand written text with math, convert to html and format formulas correctly etc. Single shot prompting doesn't work well with such problems but splitting the steps into consecutive api calls works well.

New comment by dinp in "Claude is now available in Europe"

dinp — Tue, 14 May 2024 11:01:05 +0000

I don't understand their api not being intended for individual use [0], are developers supposed to use this subscription only? The haiku model is pretty good for the price + available large context, and the opus/sonnet models are in the league of gpt-4, so I would have liked to pay for the api. I've moved on to llama 3 70B as a daily driver and it works really well! The only issue is it's small context size and it doesn't work great if you give it a lot of files to work with. Currently I'm forced to break down problems a lot more than I had to with gpt-4 or the claude models.

I know I can access the claude api through 3rd party sites + official partners, but there's no incentive to go through the trouble when llama 3 and gpt-4 apis work great for my use cases.

[0] https://support.anthropic.com/en/articles/8987200-can-i-use-...

New comment by dinp in "Daniel Kahneman has died"

dinp — Wed, 27 Mar 2024 15:50:47 +0000

The idea of system 1 and system 2 had a profound impact on me. While specific conclusions in the book were reported to be based on low quality data, it doesn't take away from the fact that it gave me a new mental lens to look at things and understand people's behaviour.

New comment by dinp in "Why are there suddenly so many car washes?"

dinp — Mon, 18 Mar 2024 03:58:37 +0000

Somehow the idea of perpetually paying property taxes and land value taxes doesn't sound appealing to me, especially since businesses already pay taxes. I don't understand the argument of designing a system to hurt a specific business type such as low value businesses. If there's a loophole such as lack of sales tax for car washes, fix that, but let the playing field remain even. If desirable high value businesses aren't able to compete with car washes, isn't that the market doing it's thing? Introducing additional property and land value taxes might discourage low value businesses, but what are the 2nd and 3rd order effects of such a change?

New comment by dinp in "Statement regarding the ongoing Sourcehut outage"

dinp — Fri, 12 Jan 2024 13:04:26 +0000

https://www.cloudflare.com/en-gb/plans/

Looks like level 3 ddos protection is only available on the enterprise plan, it's not included in the unmetered ddos protection.

New comment by dinp in "ChatGPT cut off date now April 2023"

dinp — Thu, 26 Oct 2023 06:36:49 +0000

With gpt-4 default, it's January 2022. With gpt-4 + bing it's September 2021. Strange..

New comment by dinp in "ChatGPT cut off date now April 2023"

dinp — Thu, 26 Oct 2023 06:28:27 +0000

Cut off is still January 2022 for me, maybe it's being rolled out. Finally developers' AI generated code won't be stuck using January 2022 versions.

I wonder if they use gpt-4 itself to generate the data to keep it upto date.

New comment by dinp in "Ask HN: Was any Starfighter postmortem ever published?"

dinp — Mon, 23 Oct 2023 15:00:58 +0000

In case the founders read this post: what would you do differently if you could start over and would this idea work today?

Slightly OT: everyone feels hiring is broken, can you list some things that are annoying from the employee and employer perspective? Here are some points:

- the process often stretches out over weeks and often months

- job posts often get 100s of applications, a lot are low effort applications, it just muddies the water for both sides

- ATS systems/job boards are annoying with the need to create an account on many sites, some forms have more than 20 questions, often asking what's already there in the resume.

A question to everyone: What would a good application process look like? For me, it should just solve the above mentioned problems. I send an email with my resume, a few sentences about why I might be a good fit for the role/what interests me about the company. The jobs@.com email address could be linked to some Saas product which makes it easy for the employer to go through the applications and further communication about video calls or take home assignments or whatever are all in this email thread. The employer can set the stage of the application such as 2/5 or whatever, they can mark it as rejected or accepted after all rounds to trigger automated emails etc. Is there any Saas like this? (I can build this in a week if it doesn't exist, but no clue how to market it/get users, any pointers in case me or someone else builds this?)

New comment by dinp in "Ask HN: Tell us about your project that's not done yet but you want feedback on"

dinp — Thu, 17 Aug 2023 04:54:52 +0000

Before the function calling update[0] it was possible for the gpt models to use tools using specific prompts but it was unreliable. Have you tried using function calling to let the model build the calls?

[0] https://openai.com/blog/function-calling-and-other-api-updat...

New comment by dinp in "Ask HN: Tell us about your project that's not done yet but you want feedback on"

dinp — Thu, 17 Aug 2023 03:55:57 +0000

https://apiforllm.com/

I've been working on this on and off for a few weeks now. The idea is to let LLMs interact with the outside world using function calling. Eg- I built a simple version of chatgpt web browsing using function calling. There's a django site which is responsible for the auth, managing chats etc and then there's a flask site that's responsible for running functions in a gvisor sandbox. Users will be able to upload their functions as docker images and I'll run them in a sandbox.

There is a lot of self doubt at this point: who is this for? Will people pay for usage? Should I open source the entire thing or part of it? To add to this: I have no online audience, how do I promote this?

I do have some answers to these questions. Eg- should I open source the flask site and get llama 2 to use it? It provides value to the community + hopefully I can get some attention for the project.

I would appreciate any feedback on this project. Please send me an email if you want some openai credits added to your account.

Edit: use testaccount/demouser to login, I've added some credits for testing.

New comment by dinp in "Ask HN: Help with suspected malware extension with 10M users"

dinp — Sat, 15 Jul 2023 05:36:06 +0000

You can add reviews under the chrome and firefox extensions to warn other users and then report both extensions (assuming you are confident about your findings).

More of a meta comment: this is pretty much why I don't install any extensions in my browser except an ad blocker.

You can use this as an opportunity to teach your friend about security so it doesn't happen again.

New comment by dinp in "Don't Take VC Funding – It Will Destroy Your Company"

dinp — Sun, 09 Jul 2023 15:49:51 +0000

> On the other hand, my first self-funded startup got destroyed by a VC funded venture. They had a worse product but far better marketing and they used every dirty trick in book to tarnish my company’s reputation.

Would you be willing to give a few more details about what happened? I'm not interested in the identities of the companies or people, just interested in a high level overview of what happened. We don't hear these stories often.