<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: drothlis</title><link>https://news.ycombinator.com/user?id=drothlis</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 05 Apr 2026 16:28:48 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=drothlis" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by drothlis in "Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations"]]></title><description><![CDATA[
<p>> Claude's ability to count pixels and interact with a screen using precise coordinate<p>I guess you mean its "Computer use" API that can (if I understand correctly) send mouse click at specific coordinates?<p>I got excited thinking Claude can finally do accurate object detection, but alas no. Here's its output:<p>> Looking at the image directly, the SPACE key appears near the bottom left of the keyboard interface, but I cannot determine its exact pixel coordinates just by looking at the image. I can see it's positioned below the letter grid and appears wider than the regular letter keys, but I apologize - I cannot reliably extract specific pixel coordinates from just viewing the screenshot.<p>This is 3.5 Sonnet (their most current model).<p>And they explicitly call out spatial reasoning as a limitation:<p>> Claude’s spatial reasoning abilities are limited. It may struggle with tasks requiring precise localization or layouts, like reading an analog clock face or describing exact positions of chess pieces.<p>--<a href="https://docs.anthropic.com/en/docs/build-with-claude/vision#limitations" rel="nofollow">https://docs.anthropic.com/en/docs/build-with-claude/vision#...</a><p>Since 2022 I occasionally dip in and test this use-case with the latest models but haven't seen much progress on the spatial reasoning. The multi-modality has been a neat addition though.</p>
]]></description><pubDate>Fri, 25 Oct 2024 09:21:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=41943583</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=41943583</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41943583</guid></item><item><title><![CDATA[New comment by drothlis in "Launch HN: GPT Driver (YC S21) – End-to-end app testing in natural language"]]></title><description><![CDATA[
<p>I noticed in your demo it generated the prompt "tap on the 'Log in' button located directly below the 'Facebook Password' field".<p>Does your model consistently get the positions right? (above, below, etc).  Every time I play with ChatGPT, even GPT-4o, it can't do basic spatial reasoning. For example, here's a typical output (emphasis mine):<p>> If YouTube is to the upper *left* of ESPN, press "Up" once, then *"Right"* to move the focus.<p>(I test TV apps where the input is a remote control, rather than tapping directly on the UI elements.)</p>
]]></description><pubDate>Fri, 25 Oct 2024 08:14:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=41943292</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=41943292</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41943292</guid></item><item><title><![CDATA[New comment by drothlis in "The virtuous mean between time drunkenness and work martyrdom"]]></title><description><![CDATA[
<p>Beautiful.</p>
]]></description><pubDate>Mon, 26 Feb 2024 12:29:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=39510512</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=39510512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39510512</guid></item><item><title><![CDATA[New comment by drothlis in "Automated Unit Test Improvement Using Large Language Models at Meta"]]></title><description><![CDATA[
<p><a href="https://en.wikipedia.org/wiki/Characterization_test" rel="nofollow">https://en.wikipedia.org/wiki/Characterization_test</a><p>aka snapshot tests.</p>
]]></description><pubDate>Sat, 17 Feb 2024 08:37:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=39407601</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=39407601</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39407601</guid></item><item><title><![CDATA[New comment by drothlis in "ViperGPT: Visual Inference via Python Execution for Reasoning"]]></title><description><![CDATA[
<p>According to the ViperGPT paper their "ImagePatch.find()"  uses GLIP.<p>According to the GLIP paper,† accuracy on a test-set not seen during training is around 60% so... neat demos but whether it'll be reliable enough depends on your application.<p>† <a href="https://arxiv.org/abs/2206.05836" rel="nofollow">https://arxiv.org/abs/2206.05836</a></p>
]]></description><pubDate>Mon, 20 Mar 2023 11:50:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=35230200</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=35230200</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35230200</guid></item><item><title><![CDATA[New comment by drothlis in "Even the Pylint codebase uses Ruff"]]></title><description><![CDATA[
<p>Could you implement (some of) astroid's inference using stack graphs? [1],[2]<p>That would allow a lot of caching optimisations, as you can "index" each file in isolation.<p>[1]: <a href="https://github.blog/2021-12-09-introducing-stack-graphs/" rel="nofollow">https://github.blog/2021-12-09-introducing-stack-graphs/</a><p>[2]: <a href="https://github.com/github/stack-graphs">https://github.com/github/stack-graphs</a></p>
]]></description><pubDate>Mon, 06 Mar 2023 09:30:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=35039106</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=35039106</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35039106</guid></item><item><title><![CDATA[New comment by drothlis in "Show HN: Touca – a better alternative to snapshot testing"]]></title><description><![CDATA[
<p>It side-steps the problem of git conflicts, I suppose. You'd have to use their tool (`touca diff`? I don't know if that exists) instead of `git diff`.</p>
]]></description><pubDate>Tue, 28 Feb 2023 08:56:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=34967167</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34967167</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34967167</guid></item><item><title><![CDATA[New comment by drothlis in "Show HN: Touca – a better alternative to snapshot testing"]]></title><description><![CDATA[
<p>Some ideas I got from Jeremias Rõßler's talk: <a href="https://t.co/xWtA58Q9q5" rel="nofollow">https://t.co/xWtA58Q9q5</a><p>- Snapshot testing is like version-control but for the <i>outputs</i> rather than the inputs (source code).<p>- Asserts in traditional unit tests are like "block lists" specifying which changes aren't allowed. Instead, snapshot testing allows you to specify an "allow list" of acceptable differences (e.g. timestamps).</p>
]]></description><pubDate>Tue, 28 Feb 2023 08:41:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=34967088</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34967088</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34967088</guid></item><item><title><![CDATA[New comment by drothlis in "GPT is all you need for the back end"]]></title><description><![CDATA[
<p>Obviously a sensationalised title, but it's a neat illustration of how you'd apply the language models of the future to real tasks.</p>
]]></description><pubDate>Tue, 24 Jan 2023 13:44:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=34503420</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34503420</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34503420</guid></item><item><title><![CDATA[GPT is all you need for the back end]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/TheAppleTucker/backend-GPT">https://github.com/TheAppleTucker/backend-GPT</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=34503418">https://news.ycombinator.com/item?id=34503418</a></p>
<p>Points: 252</p>
<p># Comments: 264</p>
]]></description><pubDate>Tue, 24 Jan 2023 13:44:01 +0000</pubDate><link>https://github.com/TheAppleTucker/backend-GPT</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34503418</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34503418</guid></item><item><title><![CDATA[New comment by drothlis in "Software testing, and why I'm unhappy about it"]]></title><description><![CDATA[
<p>Think systems integrators and compliance tests. I would imagine that each of the individual systems being "integrated" do have their own unit tests, upstream, in their own repos.</p>
]]></description><pubDate>Thu, 19 Jan 2023 09:54:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=34438511</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34438511</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34438511</guid></item><item><title><![CDATA[New comment by drothlis in "Software testing, and why I'm unhappy about it"]]></title><description><![CDATA[
<p>Some good ideas here for when your tests are in a separate repo than the system under test (GPUs/drivers/compilers in the case of the author, but it's applicable to a variety of industries).</p>
]]></description><pubDate>Tue, 17 Jan 2023 16:08:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=34414234</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34414234</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34414234</guid></item><item><title><![CDATA[Software testing, and why I'm unhappy about it]]></title><description><![CDATA[
<p>Article URL: <a href="http://nhaehnle.blogspot.com/2023/01/software-testing-and-why-im-unhappy.html">http://nhaehnle.blogspot.com/2023/01/software-testing-and-why-im-unhappy.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=34414193">https://news.ycombinator.com/item?id=34414193</a></p>
<p>Points: 78</p>
<p># Comments: 73</p>
]]></description><pubDate>Tue, 17 Jan 2023 16:06:09 +0000</pubDate><link>http://nhaehnle.blogspot.com/2023/01/software-testing-and-why-im-unhappy.html</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34414193</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34414193</guid></item><item><title><![CDATA[New comment by drothlis in "Cross-Branch Testing"]]></title><description><![CDATA[
<p>Related: I think it was Kernighan & Pike's "The Practice Of Programming" where I read the idea of testing a complex implementation by comparing its output against a simpler but less performant implementation.</p>
]]></description><pubDate>Mon, 16 Jan 2023 11:55:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=34399733</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34399733</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34399733</guid></item><item><title><![CDATA[New comment by drothlis in "Cross-Branch Testing"]]></title><description><![CDATA[
<p>Interesting thought, somewhat related to the articles on "snapshot testing" that have been trending on HN lately.</p>
]]></description><pubDate>Mon, 16 Jan 2023 11:47:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=34399695</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34399695</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34399695</guid></item><item><title><![CDATA[Cross-Branch Testing]]></title><description><![CDATA[
<p>Article URL: <a href="https://buttondown.email/hillelwayne/archive/cross-branch-testing/">https://buttondown.email/hillelwayne/archive/cross-branch-testing/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=34399691">https://news.ycombinator.com/item?id=34399691</a></p>
<p>Points: 2</p>
<p># Comments: 2</p>
]]></description><pubDate>Mon, 16 Jan 2023 11:47:30 +0000</pubDate><link>https://buttondown.email/hillelwayne/archive/cross-branch-testing/</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34399691</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34399691</guid></item><item><title><![CDATA[New comment by drothlis in "“Expect tests” make test-writing feel like a REPL session"]]></title><description><![CDATA[
<p>"Regression testing" can also refer to a process: When the QA team says they're doing regression testing, it means they're testing that existing functionality hasn't regressed (as opposed to testing a new feature).<p>I'm not particularly wedded to any of these terms, I'm just pointing out that "regression testing" has an established meaning, and it isn't snapshot testing (outside of certain industries, at least). I do find it amusing that one implementation of snapshot testing (<a href="https://pypi.org/project/pytest-regtest/" rel="nofollow">https://pypi.org/project/pytest-regtest/</a>) links to <a href="https://en.wikipedia.org/wiki/Regression_testing" rel="nofollow">https://en.wikipedia.org/wiki/Regression_testing</a> but that article doesn't describe snapshot testing at all! Maybe the article changed? Oh well, language changes too. ¯\_(ツ)_/¯</p>
]]></description><pubDate>Mon, 16 Jan 2023 10:32:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=34399145</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34399145</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34399145</guid></item><item><title><![CDATA[New comment by drothlis in "Ubuntu 22.04 LTS servers and phased apt updates"]]></title><description><![CDATA[
<p>In the article they don't change /etc/machine-id, but APT::Machine-ID in apt.conf.</p>
]]></description><pubDate>Sun, 15 Jan 2023 10:48:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=34388466</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34388466</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34388466</guid></item><item><title><![CDATA[New comment by drothlis in "“Expect tests” make test-writing feel like a REPL session"]]></title><description><![CDATA[
<p><a href="https://approvaltests.com/" rel="nofollow">https://approvaltests.com/</a></p>
]]></description><pubDate>Sat, 14 Jan 2023 21:26:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=34384327</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34384327</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34384327</guid></item><item><title><![CDATA[New comment by drothlis in "“Expect tests” make test-writing feel like a REPL session"]]></title><description><![CDATA[
<p>...and my favourite term, "characterization test": <a href="https://en.wikipedia.org/wiki/Characterization_test" rel="nofollow">https://en.wikipedia.org/wiki/Characterization_test</a><p>"Regression test" means something else, at least at the companies I've worked at: It means a test that was written after a defect was found in production, to ensure that the same defect doesn't happen again (that the fix doesn't "regress"). It can be a manual test or an automated test.
<a href="https://en.wikipedia.org/wiki/Regression_testing" rel="nofollow">https://en.wikipedia.org/wiki/Regression_testing</a></p>
]]></description><pubDate>Sat, 14 Jan 2023 17:10:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=34381844</link><dc:creator>drothlis</dc:creator><comments>https://news.ycombinator.com/item?id=34381844</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34381844</guid></item></channel></rss>