<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: lmeyerov</title><link>https://news.ycombinator.com/user?id=lmeyerov</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 18 Apr 2026 10:59:18 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=lmeyerov" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by lmeyerov in "Claude Design"]]></title><description><![CDATA[
<p>When Anthropic's CPO left Figma's board this week, that was my first question . Oof.</p>
]]></description><pubDate>Fri, 17 Apr 2026 15:48:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47807220</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47807220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47807220</guid></item><item><title><![CDATA[New comment by lmeyerov in "Cybersecurity looks like proof of work now"]]></title><description><![CDATA[
<p>Most companies and their vendor ecosystems run on OSS<p>Worse, "attackers no longer break in, they log in", so the supply chain attacks harvesting credentials have been frightening</p>
]]></description><pubDate>Thu, 16 Apr 2026 11:00:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47791312</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47791312</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47791312</guid></item><item><title><![CDATA[New comment by lmeyerov in "Rust Threads on the GPU"]]></title><description><![CDATA[
<p>We have this issue in GFQL right now. We wrote the first OSS GPU cypher query language impl, where we make a query plan of gpu-friendly collective operations... But today their steps are coordinated via the python, which has high constant overheads.<p>We are looking to shed something of the python<->c++<->GPU overheads by pushing macro steps out of python and into C++. However, it'd probably be way better to skip all the CPU<>GPU back-and-forth by coordinating the task queue in the GPU to beginwith . It's 2026 so ideally we can use modern tools and type as safety for this.<p>Note: I looked at the company's GitHub and didn't see any relevant oss, which changes the calculus for a team like our's. Sustainable infra is hard!</p>
]]></description><pubDate>Tue, 14 Apr 2026 05:34:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47761641</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47761641</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47761641</guid></item><item><title><![CDATA[New comment by lmeyerov in "Exploiting the most prominent AI agent benchmarks"]]></title><description><![CDATA[
<p>This is great work by Dawn Song 's team. A huge part of botsbench.com for comparing agents & models for investigation has been in protecting against this kind of thing. As AI & agents keep getting more effective & tenacious, some of the things we've had to add protections against:<p>- Contamination: AI models knowing the answers out of the gate b/c pretraining on the internet and everything big teams can afford to touch. At RSAC for example, we announced Anthropic's 4.6 series is the first frontier model to have serious training set contamination on Splunk BOTS.<p>- Sandboxing: Agents attacking the harness, as is done here - so run the agent in a sandbox, and keep the test harness's code & answerset outside<p>- Isolation: Frontier agent harnesses persist memory all over the place, where work done on one question might be used to accelerate the next. To protect against that, we do fresh sandboxing per question. This is a real feature for our work in unlocking long-horizon AI for investigations, so stay tuned for what's happening here :)<p>"You cannot improve what you cannot measure" - Lord Kelvin</p>
]]></description><pubDate>Sun, 12 Apr 2026 05:27:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47736392</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47736392</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47736392</guid></item><item><title><![CDATA[New comment by lmeyerov in "Small models also found the vulnerabilities that Mythos found"]]></title><description><![CDATA[
<p>Instead of scanning more code, afaict what you seem to want is instead, scan on the same small area, and compare on how many FPs are found there. A common measure here is what % of the reported issues got labeled as security issues and fixed. I don't see Mythos publishing on relative FP rate, so dunno how to compare those. Maybe something substantively changed?<p>At the same time, I'm not sure that really changes anything because I don't see a reason to believe attacks are constrained by the quality of source code vulnerability finding tools, at least for the last 10-15 years after open source fuzzing tools got a lot better, popular, and industrialized.<p>This might sound like a grumpy reply, but as someone on both sides here, it's easy to maintain two positions:<p>1. This stuff is great, and doing code reviews has been one of my favorite claude code use cases for a year now, including security review. It is both easier to use than traditional tools, and opens up higher-level analysis too.<p>2. Finding bugs in source code was sufficiently cheap already for attackers. They don't need the ease of use or high-level thing in practice, there's enough tooling out there that makes enough of these. Likewise, groups have already industrialized.<p>There's an element of vuln-pocalypse that may be coming with the ease of use going further than already happening with existing out-of-the-box blackbox & source code scanning tools . That's not really what I worry about though.<p>Scarier to me, instead, is what this does to today's reliance on human response. AI rapidly industrializes what how attackers escalate access and wedge in once they're in. Even without AI, that's been getting faster and more comprehensive, and with AI, the higher-level orchestration can get much more aggressive for much less capable people. So the steady stream of existing vulns & takeovers into much more industrialized escalations is what worries me more. As coordination keeps moving into machine speed, the current reliance on human response is becoming less and less of an option.</p>
]]></description><pubDate>Sat, 11 Apr 2026 20:48:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47733915</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47733915</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47733915</guid></item><item><title><![CDATA[New comment by lmeyerov in "LLMs can't justify their answers–this CLI forces them to"]]></title><description><![CDATA[
<p>Maybe there's a fundamental miscommunication here of what evals are?<p>Evals apply not just to LLMs but to skills, prompts, tools, and most things changing the behavior of compound AI systems, and especially like the productivity claims being put forth in this thread.<p>The features in the post relate directly to heavily researched areas of agents that are regularly benchmarked and evaluated. They're not obscure, eg, another recent HN frontpage item benchmarked on research and planning.</p>
]]></description><pubDate>Sat, 11 Apr 2026 08:45:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47728776</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47728776</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47728776</guid></item><item><title><![CDATA[New comment by lmeyerov in "Launch HN: Twill.ai (YC S25) – Delegate to cloud agents, get back PRs"]]></title><description><![CDATA[
<p>We find it true in Louie.ai evals (ai for investigations), about a 10-20% lift which meaningful. It'd measured here: botsbench.com .<p>Unfortunately, undesirable in practice due to people being token-constrained even before. One case is retrying only on failure, but even that is a bit tricky...</p>
]]></description><pubDate>Sat, 11 Apr 2026 05:01:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47727563</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47727563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47727563</guid></item><item><title><![CDATA[New comment by lmeyerov in "Research-Driven Agents: When an agent reads before it codes"]]></title><description><![CDATA[
<p>I've found value in architectural research before r&d tier projects like big changes to gfql, our oss gpu cypher implementation. It ends up multistage:<p>- deep research for papers, projects etc. I prefer ChatGPT Pro Deep Research here As it can quickly survey hundreds of sources for overall relevance<p>- deep dives into specific papers and projects, where an AI coding agent downloads relevant papers and projects for local analysis loops, performs technical breakdowns into essentially a markdown wiki, and then reduces over all of them into a findings report. Claude code is a bit nicer here because it supports parallel subagents well.<p>- iterative design phase where the agent iterates between the papers repos and our own project to refine suggestions and ideas<p>Fundamentally, this is both exciting, but also limiting: It's an example of 'Software Collapse' where we get to ensure best practices and good ideas from relevant communities, but the LLM is not doing the creativity here, just mashing up and helping pick.<p>Tools to automate the stuff seems nice. I'd expect it to be trained into the agents soon as it's not far from their existing capabilities already. Eg, 'iteratively optimize function foobar, prefer GPU literature for how.'</p>
]]></description><pubDate>Thu, 09 Apr 2026 23:36:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47711706</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47711706</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47711706</guid></item><item><title><![CDATA[New comment by lmeyerov in "LLMs can't justify their answers–this CLI forces them to"]]></title><description><![CDATA[
<p>It sounds like the answer is "No, there is no repeatable eval of the core AI coding productivity claim, definitely not on one of the many AI coding benchmarks in the community used for understanding & comparison, and there will not be"</p>
]]></description><pubDate>Thu, 09 Apr 2026 20:33:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47709558</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47709558</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47709558</guid></item><item><title><![CDATA[New comment by lmeyerov in "Code Is Cheap Now, and That Changes Everything"]]></title><description><![CDATA[
<p>I'm not too familiar with etsy, but presumably most etsy sellers are closer to being lemonade stands than they are to being ikea<p>And yes, sometimes it's nice to support a local lemonade stand. For my family's income, I know which segment I'd feel more confident to work for..</p>
]]></description><pubDate>Thu, 09 Apr 2026 14:22:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47704148</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47704148</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47704148</guid></item><item><title><![CDATA[New comment by lmeyerov in "LLMs can't justify their answers–this CLI forces them to"]]></title><description><![CDATA[
<p>My question was on claims like "5x productivity boost in merged PRs (lots of open PR & merge rate goes down, but net positive)", eg, does this change anything on swe-bench or any other standard coding eval?</p>
]]></description><pubDate>Thu, 09 Apr 2026 04:02:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47699134</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47699134</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47699134</guid></item><item><title><![CDATA[New comment by lmeyerov in "LLMs can't justify their answers–this CLI forces them to"]]></title><description><![CDATA[
<p>Evals let us agree on the baseline, measurement, etc, and compare if simple things others do perform just as well. For same reason, instead of 'works on my box' and 'my coding style', use one of the many community evals vs making up your own benchmark.<p>That helps head off much of many of the unfalsifiable discussions & claims happening and moves everyone forward.</p>
]]></description><pubDate>Wed, 08 Apr 2026 06:31:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47686162</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47686162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47686162</guid></item><item><title><![CDATA[New comment by lmeyerov in "LLMs can't justify their answers–this CLI forces them to"]]></title><description><![CDATA[
<p>Evals or GTFO</p>
]]></description><pubDate>Mon, 06 Apr 2026 03:44:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47656754</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47656754</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47656754</guid></item><item><title><![CDATA[New comment by lmeyerov in "Grafeo – A fast, lean, embeddable graph database built in Rust"]]></title><description><![CDATA[
<p>Speaking of embeddable, we just announced cypher syntax for gfql, so the first OSS CPU/GPU cypher query engine you can use on dataframes<p>Typically used with scaleout DBs like databricks & splunk for analytical apps: security/fraud/event/social data analysis pipelines, ML+AI embedding & enrichment pipelines, etc. We originally built it for the compute-tier gap here to help Graphistry users making embeddable interactive GPU graph viz apps and dashboards and not wanting to add an external graph DB phase into their interactive analytics flows.<p>Single GPU can do 1B+ edges/s, no need for a DB install, and can work straight on your dataframes / apache arrow / parquet: <a href="https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark_filter_pagerank.html" rel="nofollow">https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...</a><p>We took a multilayer approach to the GPU & vectorization acceleration, including a more parallelism-friendly core algorithm. This makes fancy features pay-as-you-go vs dragging everything down as in most columnar engines that are appearing. Our vectorized core conforms to over half of TCK already, and we are working to add trickier bits on different layers now that flow is established.<p>The core GFQL engine has been in production for a year or two now with a lot of analyst teams around the world (NATO, banks, US gov, ...) because it is part of Graphistry. The open-source cypher support is us starting to make it easy for others to directly use as well, including LLMs :)</p>
]]></description><pubDate>Sat, 21 Mar 2026 20:26:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47470929</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47470929</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47470929</guid></item><item><title><![CDATA[New comment by lmeyerov in "FBI is buying location data to track US citizens, director confirms"]]></title><description><![CDATA[
<p>I would ban apps using unsafe ad platforms<p>If I was simultaneously also the owner of the ad platform, I'd fix it & knock out the bad players, or get ready to be sued for a decade+ of knowing malpractice<p>And if I was a US citizen seeing the companies being involved be sued for being monopolies and abusing their position, and then seeing them cry security in court yet knowingly do this for a decade+, I'd feel frustrated by successive left + right US administrations & voters</p>
]]></description><pubDate>Thu, 19 Mar 2026 15:27:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47441121</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47441121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47441121</guid></item><item><title><![CDATA[New comment by lmeyerov in "FBI is buying location data to track US citizens, director confirms"]]></title><description><![CDATA[
<p>There are plenty of bad actors<p>The interesting part is Google & Apple, as part of explaining to courts why their large app store fees are legit and not proof of monopoly positions, hid behind the security argument that they need to be the clearing house of what software runs on the devices. Except... they've knowingly punted on this one for 10+ years.<p>I would 100% agree that losing privacy through any utility-level carrier (credit cards, phone, OS provider, etc) should be default disallowed, and any opt-ins have a clear transparency mode with easy opt-out. At least two areas the US can learn from the EU on digital policy is digital marketplaces and consumer privacy protection, and this topic is at the intersection of both.</p>
]]></description><pubDate>Thu, 19 Mar 2026 00:05:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47432983</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47432983</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47432983</guid></item><item><title><![CDATA[New comment by lmeyerov in "FBI is buying location data to track US citizens, director confirms"]]></title><description><![CDATA[
<p>*legal in the US</p>
]]></description><pubDate>Thu, 19 Mar 2026 00:02:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47432963</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47432963</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47432963</guid></item><item><title><![CDATA[New comment by lmeyerov in "FBI is buying location data to track US citizens, director confirms"]]></title><description><![CDATA[
<p>You can trace the big players<p>If Google & Apple & friends refused to take a rake and opened distribution, then I'd agree, net neutrality etc, not their problem<p>But they own so much, and so deep into the pipeline, and explain their fees to courts because "security"... and then don't do investigations. They employ some of the best security analysts in the world and have $10-30B/yr revenue tied to just the app store fees, so they very much can take a big bite out of this if they wanted.</p>
]]></description><pubDate>Thu, 19 Mar 2026 00:00:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47432938</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47432938</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47432938</guid></item><item><title><![CDATA[New comment by lmeyerov in "FBI is buying location data to track US citizens, director confirms"]]></title><description><![CDATA[
<p>Apple and Google are facilitating the data sales<p>Specifically, these big companies revenue share with app companies who in turn increase monetization via selling your private information, esp via free apps. In exchange for Apple etc super high app store rake percentage fees, they claim to run security vetting programs and ToS that vet who they do business with and tell users & courts that things are safe, even when they know they're not.<p>It's not rocket science for phone OS's to figure out who these companies are and, as iOS / android os users already get tracked by apple/google/etc, triangulate to which apps are participating</p>
]]></description><pubDate>Wed, 18 Mar 2026 20:48:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47431261</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47431261</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47431261</guid></item><item><title><![CDATA[New comment by lmeyerov in "Many SWE-bench-Passing PRs would not be merged"]]></title><description><![CDATA[
<p>Once my code exists and passes test, I generally move on to having it iteratively hunt for bugs, security issues, and DRY code reduction opportunities until it stops finding worthwhile ones.<p>This doesn't always work as well as I'd like, but largely does enough. Conversely, doing as I go has been a waste of time.</p>
]]></description><pubDate>Thu, 12 Mar 2026 13:04:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47350018</link><dc:creator>lmeyerov</dc:creator><comments>https://news.ycombinator.com/item?id=47350018</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47350018</guid></item></channel></rss>