Hacker News: happypumpkin

New comment by happypumpkin in "Differential Transformer"

happypumpkin — Tue, 08 Oct 2024 13:54:42 +0000

> it would probably become obsolete soon

Suppose there are many times more posts about something one generation of LLMs can't do (arithmetic, tic-tac-toe, whatever), than posts about how the next generation of models can do that task successfully. I think this is probably the case.

While I doubt it will happen, it would be somewhat funny if training on that text caused a future model to claim it can't do something that it "should" be able to because it internalized that it was an LLM and "LLMs can't do X."

New comment by happypumpkin in "Notes on OpenAI's new o1 chain-of-thought models"

happypumpkin — Fri, 13 Sep 2024 19:11:36 +0000

The new model does play very well but when it draws the board it frequently places the moves in incorrect locations (but seemingly still keeps track of the correct ones). But I can't fault it too much, I don't think what is essentially ASCII art is intended to be a strength of the model.

Edit: Actually third game with it led to it making an illegal move, and claiming a draw (which would've been inevitable given optimal play for the rest of the game but there were several valid moves left to make).

New comment by happypumpkin in "The AI Scientist: Towards Automated Open-Ended Scientific Discovery"

happypumpkin — Tue, 13 Aug 2024 17:11:42 +0000

Also a concern about the paper generation process itself:

> In a similar vein to idea generation, The AI Scientist is allowed 20 rounds to poll the Semantic Scholar API looking for the most relevant sources to compare and contrast the near-completed paper against for the related work section. This process also allows The AI Scientist to select any papers it would like to discuss and additionally fill in any citations that are missing from other sections of the paper.

So... they don't look for related work until the paper is "near-completed." Seems a bit backwards to me.

New comment by happypumpkin in "The AI Scientist: Towards Automated Open-Ended Scientific Discovery"

happypumpkin — Tue, 13 Aug 2024 17:05:54 +0000

Potential concerns with their self-eval:

They evaluate their automated reviewer by comparing against human evaluations on human-written research papers, and then seem to extrapolate that their automated reviewer would align with human reviewers on AI-written research papers. It seems like there are a few major pitfalls with this.

First, if their systems aren't multimodal, and their figures are lower-quality than human-created figures (which they explicitly list as a limitation), the automated reviewer would be biased in favor of AI-generated papers (only having access to the text). This is an obvious one but I think there could easily be other aspects of papers where the AI and human reviewers align on human-written papers, but not on AI papers.

Additionally, they note:

> Furthermore, the False Negative Rate (FNR) is much lower than the human baseline (0.39 vs. 0.52). Hence, the LLM-based review agent rejects fewer high-quality papers. The False Positive Rate (FNR [sic]), on the other hand, is higher (0.31 vs. 0.17)

It seems like false positive rate is the more important metric here. If a paper is truly high-quality, it is likely to have success w/ a rebuttal, or in getting acceptance at another conference. On the other hand, if this system leads to more low-quality submissions or acceptances via a high FPR, we're going to have more AI slop and increased load on human reviewers.

I admit I didn't thoroughly read all 185 pages, maybe these concerns are misplaced.

New comment by happypumpkin in "Testing Generative AI for Circuit Board Design"

happypumpkin — Sat, 22 Jun 2024 17:46:30 +0000

> Artists and "creative" people have long held a monopoly on this ability and are now finally paying the price

I've seen a lot of schadenfreude towards artists recently, as if they're somehow gatekeeping art and stopping the rest of us from practicing it.

I really struggle to understand it; the barrier of entry to art is basically just buying a paper and pencil and making time to practice. For most people the practice time could be spent on many things which would have better economic outcomes.

> monopoly

Doesn't this term imply an absence of competition? There seems to be a lot of competition. Anyone can be an artist, and anyone can attempt to make a living doing art. There is no certification, no educational requirements. I'm sure proximity to wealth is helpful but this is true of approximately every career or hobby.

Tangentially, there seem to be positive social benefits to everyone having different skills and depending on other people to get things done. It makes me feel good when people call me up asking for help with something I'm good at. I'm sure it feels the same for the neighborhood handyman when they fix someone's sink, the artist when they make profile pics for their friends, etc. I could be wrong but I don't think it'll be entirely good for people when they can just have an AI or a robot do everything for them.

New comment by happypumpkin in "Claude 3.5 Sonnet"

happypumpkin — Thu, 20 Jun 2024 21:35:24 +0000

This is also one of the first things I test with new models. I did notice that while it still plays very poorly, it is actually far more consistent with the board state, making only legal moves, and noticing when I win than is GPT4o.

New comment by happypumpkin in "AI in software engineering at Google: Progress and the path ahead"

happypumpkin — Fri, 07 Jun 2024 04:53:04 +0000

On a related note, Microsoft published a press release last year [1] where they seemed to suggest that 30% of accepted copilot suggests was a 30% productivity boost for devs.

> users accept nearly 30% of code suggestions from GitHub Copilot

> Using 30% productivity enhancement, with a projected number of 45 million professional developers in 2030, generative AI developer tools could add productivity gains of an additional 15 million “effective developers” to worldwide capacity by 2030. This could boost global GDP by over $1.5 trillion

They were probably just being disingenuous to drum up hype but if not they'd have to believe that:

1) All lines of code take the same amount of time to produce 2) 100% of a developer's job is writing code

[1]: https://github.blog/2023-06-27-the-economic-impact-of-the-ai...

New comment by happypumpkin in "The right not to be subjected to AI profiling based on publicly available data"

happypumpkin — Thu, 06 Jun 2024 22:28:53 +0000

I suspect the two major drivers of this are vehicles getting larger over time and smartphones.

New comment by happypumpkin in "Simple tasks showing reasoning breakdown in state-of-the-art LLMs"

happypumpkin — Wed, 05 Jun 2024 15:27:22 +0000

From the paper:

> To account for the response variations due to various prompt forms, we created 3 distinct prompt types asking for the solution to the AIW problem: STANDARD, THINKING, and RESTRICTED. The STANDARD prompt type asks to solve the posed problem and output the final answer in the format as described above. This does not put any specific requirements on model behavior. The THINKING prompt type extends STANDARD with the request to think carefully and double check the solution for any mistakes

New comment by happypumpkin in "Simple tasks showing reasoning breakdown in state-of-the-art LLMs"

happypumpkin — Wed, 05 Jun 2024 15:22:39 +0000

First result:

Me > Alice has 40 brothers and she also has 50 sisters. How many sisters does Alice's brother have?

GPT4o (web version) > Alice has 50 sisters. Since all of Alice's brothers and sisters share the same set of siblings, each of Alice's 40 brothers also has 50 sisters. Therefore, Alice's brother has 50 sisters.

---

Step by step:

Me > Alice has 40 brothers and she also has 50 sisters. How many sisters does Alice's brother have? Think about it step by step.

GPT4o > To determine how many sisters Alice's brother has, follow these steps:

    Identify the total number of siblings: Alice has 40 brothers and 50 sisters, plus Alice herself. Thus, the total number of siblings is:
    40 (brothers)+50 (sisters)+1 (Alice)=91 siblings
    40 (brothers)+50 (sisters)+1 (Alice)=91 siblings

    Focus on Alice's brother: Alice's brother is one of the 40 brothers.

    Determine the number of sisters for Alice's brother: Each brother has the same number of sisters. Since the brothers do not count themselves as sisters, they only count the 50 sisters, excluding themselves and other brothers.

Therefore, each of Alice's brothers has: 50 sisters

So, Alice's brother has 50 sisters.

---

Thinking step by step somehow gave an even more nonsensical answer, I'm actually very surprised it didn't work when prompted to do it step by step.

New comment by happypumpkin in "Lisp: Icing or Cake?"

happypumpkin — Sun, 02 Jun 2024 18:03:14 +0000

This is great! I used to hate all forms of webdev but ClojureScript actually makes it enjoyable, I really hope it gets more traction.

New comment by happypumpkin in "Man scammed after AI told him fake Facebook customer support number was real"

happypumpkin — Fri, 31 May 2024 19:57:15 +0000

Don't forget the account to open shared mailboxes for packages. "Luxor" for me. It actually works so I don't mind much but I hadn't really considered how much extra rent all the apps might be costing me.

New comment by happypumpkin in "Legal models hallucinate in 1 out of 6 (or more) benchmarking queries"

happypumpkin — Fri, 31 May 2024 19:54:43 +0000

I have an uncle who is an attorney in X state. I had him try, using GPT4, a bunch of prompts about X state law in his specialty and the rate was of hallucination was much higher than 1 in 6. Probably half or more were incorrect. Often the answers would be correct for other states, but not for X state. Alternatively, they were correct for X state at a certain point in time, but no longer are.

New comment by happypumpkin in "Man scammed after AI told him fake Facebook customer support number was real"

happypumpkin — Fri, 31 May 2024 18:33:52 +0000

Crazy. If they won't let me speak to a person I'd still much prefer just having a generic click-your-timeslot web app than waste time talking to a bot. And for millions of dollars they could just hire a human for a decade or more...

New comment by happypumpkin in "Man scammed after AI told him fake Facebook customer support number was real"

happypumpkin — Fri, 31 May 2024 17:11:42 +0000

I'm Gen-Z and talking to a human representative of a company makes me much more confident that something will happen as a result of my efforts (though still not certain).

I scheduled an apartment viewing recently, and the only method they provided to do so was chatting with an AI (seriously)... I then tried and failed to find a way to contact a human for confirmation multiple times. Lo and behold nobody at the leasing office when I showed up at the scheduled time. Came back later and eventually found somebody - they had not seen anything I'd done with the bot.

Software for small businesses and local governments is often really bad and I'd much prefer to make sure a person knows what I'm trying to get accomplished.

New comment by happypumpkin in "Microsoft Rolling Out New Windows Subsystem for Linux "WSL" Features for 2024"

happypumpkin — Fri, 31 May 2024 14:43:54 +0000

> especially Google

Yup... Firefox, Kagi, & Protonmail get me away from the worst of it but YouTube doesn't really have a good competitor and other people using things like Google Forms (whatever the surveys are called) sometimes ends up forcing me to log into a Google account or have certain features locked out.

New comment by happypumpkin in "Codestral: Mistral's Code Model"

happypumpkin — Wed, 29 May 2024 21:51:14 +0000

Similar experience using GPT4 for help with Apple's Accessibility API. I wanted to do some non-happy-path things and it kept looping between solutions that failed to satisfy at least one of a handful of requirements that I had, and in ways that I couldn't combine the different "solutions" to meet all the requirements.

I was eventually able to figure it out with the help of some early 2010s blog posts. Sadly I didn't test giving it that context and having it attempt to find a solution again (and this was before web browsing was integrated with the web app).

More of an issue than it not knowing enough to fulfill my request (it was pretty obscure so I didn't necessarily expect that it would be able to) was that it didn't mind emitting solutions that failed to meet the requirements. "I don't know how to do that" would've been a much preferred answer.

New comment by happypumpkin in "Study finds that 52% of ChatGPT answers to programming questions are wrong"

happypumpkin — Fri, 24 May 2024 18:30:11 +0000

Yeah I really don't understand why research is still being published that uses GPT3.5 rather than GPT4 or both models. ~500 programming questions is maybe a few bucks on the API?

New comment by happypumpkin in "Study finds that 52% of ChatGPT answers to programming questions are wrong"

happypumpkin — Fri, 24 May 2024 13:16:56 +0000

From the paper:

"Additionally, this work has used the free version of ChatGPT (GPT-3.5)"

New comment by happypumpkin in "Anger Does a Lot More Damage to Your Body Than You Realize"

happypumpkin — Fri, 24 May 2024 12:28:27 +0000

"Let the hate flow through you"