Hacker News: huac

New comment by huac in "Some critical issues with the SWE-bench dataset"

huac — Fri, 21 Feb 2025 19:47:36 +0000

that comment refers to the test time inference, i.e. what the model is prompted with, not to what it is trained on. this is, of course, also a tricky problem (esp over long context, needle in a haystack), but it should be much easier than memorization.

anyways, another interpretation is that the model needs to also make a decision on if the code in the issue is a reliable fix or not too

New comment by huac in "Some critical issues with the SWE-bench dataset"

huac — Fri, 21 Feb 2025 19:34:46 +0000

> 32.67% of the successful patches involve cheating as the solutions were directly provided in the issue report or the comments.

Looking at the benchmark, https://www.swebench.com/, about half of scored submissions score under 1/3 correct? So they're either not cheating, or not cheating effectively?

New comment by huac in "Test-driven development with an LLM for fun and profit"

huac — Fri, 17 Jan 2025 11:34:11 +0000

> Coding assistants based on o1 and Sonnet are pretty great at coding with <50k context, but degrade rapidly beyond that.

I had a very similar impression (wrote more in https://hua.substack.com/p/are-longer-context-windows-all-yo...).

One framing is that effective context window (i.e. the length that the model is able to effectively reason over) determines how useful the model is. A human new grad programmer might effectively reason over 100s or 1000s of tokens but not millions - which is why we carefully scope the work and explain where to look for relevant context only. But a principal engineer might reason over many many millions of context - code yes, but also organizational and business context.

Trying to carefully select those 50k tokens is extremely difficult for LLMs/RAG today. I expect models to get much longer effective context windows but there are hardware / cost constraints which make this more difficult.

Are longer context windows all you need for AI codegen?

huac — Fri, 17 Jan 2025 11:28:14 +0000

Article URL: https://hua.substack.com/p/are-longer-context-windows-all-you

Comments URL: https://news.ycombinator.com/item?id=42736427

Points: 1

# Comments: 0

New comment by huac in "Nepenthes is a tarpit to catch AI web crawlers"

huac — Fri, 17 Jan 2025 11:21:01 +0000

from an AI research perspective -- it's pretty straightforward to mitigate this attack

1. perplexity filtering - small LLM looks at how in-distribution the data is to the LLM's distribution. if it's too high (gibberish like this) or too low (likely already LLM generated at low temperature or already memorized), toss it out.

2. models can learn to prioritize/deprioritize data just based on the domain name of where it came from. essentially they can learn 'wikipedia good, your random website bad' without any other explicit labels. https://arxiv.org/abs/2404.05405 and also another recent paper that I don't recall...

New comment by huac in "Daisy, an AI granny wasting scammers' time"

huac — Thu, 14 Nov 2024 17:53:29 +0000

real-time full duplex like OpenAI GPT-4o is pretty expensive. cascaded approaches (usually about 800ms - 1 second delay) are slower and worse, but very very cheap. when I built this a year ago, I estimated the LLM + TTS + other serving costs to be less than the Twilio costs.

New comment by huac in "Go library for in-process vector search and embeddings with llama.cpp"

huac — Wed, 30 Oct 2024 02:20:46 +0000

nice work! I wrote a similar library (https://github.com/stillmatic/gollum/blob/main/packages/vect...) and similarly found that exact search (w/the same simple heap + SIMD optimizations) is quite fast. with 100k objects, retrieval queries complete in <200ms on an M1 Mac. no need for a fancy vector DB :)

that library used `viterin/vek` for SIMD math: https://github.com/viterin/vek/

New comment by huac in "Show HN: Mdx – Execute your Markdown code blocks, now in Go"

huac — Sat, 26 Oct 2024 18:39:27 +0000

reminds me a lot of rmarkdown - which allows you to run many languages in a similar fashion https://rmarkdown.rstudio.com/

New comment by huac in "Taiwan Makes the World's Computer Chips. Now It's Running Out of Electricity"

huac — Sun, 06 Oct 2024 20:09:21 +0000

shouldn't there be more clouds over ocean, as that is where the clouds tend to form?

New comment by huac in "Moshi: A speech-text foundation model for real time dialogue"

huac — Thu, 19 Sep 2024 21:45:34 +0000

> there needs to be a tool/function calling step before a reply

I built that almost exactly a year ago :) it was good but not fast enough - hence building the joint model.

New comment by huac in "Moshi: A speech-text foundation model for real time dialogue"

huac — Thu, 19 Sep 2024 01:14:01 +0000

> Current AI (even GPT-4o) simply isn't capable enough to do useful stuff. You need to augment it somehow - either modularize it, or add RAG, or similar

I am sympathetic to this view but strongly disagree that you need a transcript. Think about it a bit more!!

New comment by huac in "Moshi: A speech-text foundation model for real time dialogue"

huac — Wed, 18 Sep 2024 20:21:12 +0000

One guess is that the live demo is quantized to run fast on cheaper GPUs, and that degraded the performance a lot.

New comment by huac in "DisTrO – a family of low latency distributed optimizers"

huac — Tue, 27 Aug 2024 21:47:01 +0000

in particular it appears that they only implement data parallel DP - at 1.2B you can fit full copy of model into memory, but larger models require splitting the weights across multiple machines (different techniques eg distributed data parallel DDP, tensor parallel TP, pipeline parallel TP, ...)

without more details it's unclear if the proposed technique keeps its speedups in that case

New comment by huac in "Hermes 3: The First Fine-Tuned Llama 3.1 405B Model"

huac — Fri, 16 Aug 2024 01:41:53 +0000

no. only 4 categories can be blocked, while the others cannot be disabled.

New comment by huac in "I was a 20-something dethroned dotcom ceo that went to work at mcdonald's (2000)"

huac — Mon, 05 Aug 2024 16:07:52 +0000

His most recent LinkedIn role: Fulfillment Center Associate I, Part Time, Amazon.

New comment by huac in "Large Enough"

huac — Wed, 24 Jul 2024 21:08:37 +0000

> Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

We are also not exactly looking letter by letter at everything we read.

New comment by huac in "When ChatGPT summarises, it does nothing of the kind"

huac — Sun, 21 Jul 2024 21:44:38 +0000

I gave the same article to Claude 3.5 Sonnet and the result seems reasonably similar to the author's handwritten summary.

``` This article examines the governance of Dutch pension funds in light of the Future of Pensions Act (Wtp). The new legislation shifts towards more complete pension contracts and emphasizes operational execution, necessitating changes in pension fund governance. The authors propose strengthening pension funds' internal organization, improving accountability to participants, and enhancing the powers of participant representation bodies. Key recommendations include establishing a recognizable governance structure with clear responsibilities, creating a College of Stakeholders (CvB) to replace existing accountability bodies, and granting the CvB more authority, including appointment and dismissal powers. The proposals aim to balance the interests of social partners, pension funds, and participants while ensuring transparency and effective oversight. The article emphasizes principles such as transparency, trust, loyalty, and prudence in shaping governance reforms. It also discusses the impact of digitalization (DORA), the need for pension funds to demonstrate value, and the potential for further consolidation in the sector. International perspectives, including insights from the World Bank, inform the proposed governance improvements. These changes are designed to help pension funds adapt to the new system, manage risks effectively, and maintain their "license to operate" in a changing landscape. ```

Similarly, the second article's summary also captures the key points that the author points out (emphasis mine).

``` The article "Regulating pensions: Why the European Union matters" explores the growing influence of EU law on pension regulation. While Member States retain primary responsibility for pension provision, the authors argue that EU law significantly impacts national pension systems through both direct and indirect means. The paper begins by examining the EU's institutional framework regarding pensions, focusing on the principles of subsidiarity and the division of powers between the EU and Member States. It emphasizes that the EU can regulate pension matters when the Internal Market's functioning is at stake, despite lacking specific regulatory competencies for pensions. The authors note that the subsidiarity principle has not proven to be an obstacle for EU action in this area. The article then delves into EU substantive law and its impact on pensions, concentrating on the concept of Services of General Economic Interest (SGEI) and its role in classifying pension fund activities as economic or non-economic. The authors discuss the case law of the Court of Justice of the European Union (CJEU), highlighting its importance in determining when pension schemes fall within the scope of EU competition law. They emphasize that the CJEU's approach is based on the degree of solidarity in the scheme and the extent of state control. ** The paper examines the IORP Directive, outlining its current scope and limitations. The authors argue that the directive is unclear and leads to distortions in the internal market, particularly regarding the treatment of pay-as-you-go schemes and book reserves. They propose a new regulatory framework that distinguishes between economic and non-economic pension activities. For non-economic activities, the authors suggest a soft law approach using a non-binding code or communication from the European Commission. This would outline the basic features of pension schemes based on solidarity and the conditions for exemption from EU competition rules. For economic activities, they propose a hard law approach following the Lamfalussy technique, which would provide detailed regulations similar to the Solvency II regime but tailored to the specifics of IORPs (Institutions for Occupational Retirement Provision). ** The authors conclude that it's impossible to categorically state whether pensions are a national or EU competence, as decisions must be made on a case-by-case basis. They emphasize the importance of considering EU law when drafting national pension legislation and highlight the need for clarity in the division of powers between the EU and Member States regarding pensions. Overall, the paper underscores the complex interplay between EU law and national pension systems, calling for a more nuanced understanding of the EU's role in pension regulation and a clearer regulatory framework that respects both EU and national competencies. ```

I'd bet that the author used GPT 3.5-turbo (aka the free version of ChatGPT) and did not give any particular prompting help. To create these, I asked Claude to create a prompt for summarization with chain of thought revision, used that prompt, and returned the result. Better models with a little bit more inference time compute go a long way.

New comment by huac in "StreamVC: Real-Time Low-Latency Voice Conversion"

huac — Fri, 12 Jul 2024 15:29:56 +0000

except to the extent that your voice may be part of your image, which is actionable: https://en.wikipedia.org/wiki/Midler_v._Ford_Motor_Co.

New comment by huac in "StreamVC: Real-Time Low-Latency Voice Conversion"

huac — Fri, 12 Jul 2024 04:11:27 +0000

The samples were released a while back: https://google-research.github.io/seanet/stream_vc/

New comment by huac in "Llama-3-V model was plagiarized from MiniCPM"

huac — Mon, 03 Jun 2024 17:18:31 +0000

ah nice, did not see this because I searched for `llama-3-v`