<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: veganmosfet</title><link>https://news.ycombinator.com/user?id=veganmosfet</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 27 Jun 2026 01:51:24 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=veganmosfet" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by veganmosfet in "What happened after 2k people tried to hack my AI assistant"]]></title><description><![CDATA[
<p>Thanks! I tried to submit the posts but for some reason my submissions are not published in HN any more. I tried to reach out to HN admins but no response so far.</p>
]]></description><pubDate>Fri, 26 Jun 2026 14:27:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48687026</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=48687026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48687026</guid></item><item><title><![CDATA[New comment by veganmosfet in "What happened after 2k people tried to hack my AI assistant"]]></title><description><![CDATA[
<p>It would be nice to publish the exact setup used (workspace dump, OpenClaw version, ...) to be able to reproduce and try out more payloads.<p>In general I have mixed feelings about this result: sure, opus4.6 is excellent at following user intent and recognise potential prompt injection attempts. But: 
Is the "security" prompt used realistic for a generic use-case (processing of emails)? I guess not.<p>In my experiments - without this specific prompt - I was able to derail the user intent to make opus4.8 download and execute a malicious script [0] just by asking "Summarize my new emails".<p>[0] <a href="https://itmeetsot.eu/posts/2026-06-04-openclaw_opus48/" rel="nofollow">https://itmeetsot.eu/posts/2026-06-04-openclaw_opus48/</a></p>
]]></description><pubDate>Fri, 26 Jun 2026 06:41:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48683136</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=48683136</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48683136</guid></item><item><title><![CDATA[New comment by veganmosfet in "Prompt Injection as Role Confusion"]]></title><description><![CDATA[
<p>Very interesting research. I would be interested to know how closed source AI labs implement the role thing in their inference. Is it still only a separation token? Frontier closed source LLMs are quite good at flagging any spoofing attempt from tool call results.<p>However, in some prompt injection experiments [0], I found it's possible to "derail" the user intent only with tool call results, here are some tricks:<p>* Frame the injection as a challenge.
* Always use "soft" instructions ("You may", "Try to", ...). Hard instructions are almost always flagged.
* Force the model to do multiple tool calls.
* Bloat the context.
* In the injection payload, better use LLM output (which correlates somehow with this research). I like using LLM generated poems but that's probably irrelevant.
* Use multiple encoding steps to force the model to use tools, but this may be detected by the external guardrails (Anthropic does this in my experience).
* Hide malicious code payload from the model context. 
* Last but not least, understand the agent harness used and its weaknesses (e.g., in OpenClaw, they injected emails as user message - not tool call results [1]).<p>[0] <a href="https://itmeetsot.eu/posts/2026-06-14-yolo_harness/" rel="nofollow">https://itmeetsot.eu/posts/2026-06-14-yolo_harness/</a>
[1] <a href="https://itmeetsot.eu/posts/2026-02-02-openclaw_mail_rce/" rel="nofollow">https://itmeetsot.eu/posts/2026-02-02-openclaw_mail_rce/</a></p>
]]></description><pubDate>Tue, 23 Jun 2026 06:34:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48641164</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=48641164</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48641164</guid></item><item><title><![CDATA[New comment by veganmosfet in "Ramp's Sheets AI Exfiltrates Financials"]]></title><description><![CDATA[
<p>I think a better comparison is <i>humans</i> versus LLMs - not computer programs. However, most of the non-technical 'countermeasures' used for humans (contracts, laws,...) do not work for LLMs because they are not accountable.</p>
]]></description><pubDate>Thu, 30 Apr 2026 04:34:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47958145</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47958145</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47958145</guid></item><item><title><![CDATA[New comment by veganmosfet in "Show HN: BrokenClaw Part 5: GPT-5.4 Edition (Prompt Injection)"]]></title><description><![CDATA[
<p>Thanks!<p>I did (not extensively) tried hackmyclaw but no success. The challenge is a complete black box and the <i>user intent</i> (e.g., "summarize my emails") is not known - this is critical for the prompt injection payload. I also suspect that batch processing of "malicious" emails (every 3 hours) adds a bias to the model behaviour (a lot of potential and detected prompt injection payloads are injected in context). That's why I always start my experiments with a fresh context. Moreover, "hacking" the VPS is not allowed.<p>Imho the author shall disclose more info about the setup (version, user intent, exact config) to make it more realistic. I read people saying "OpenClaw is secure against prompt injection" because nobody was able to solve the challenge - it's not.</p>
]]></description><pubDate>Fri, 10 Apr 2026 03:59:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47713446</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47713446</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47713446</guid></item><item><title><![CDATA[Show HN: BrokenClaw Part 5: GPT-5.4 Edition (Prompt Injection)]]></title><description><![CDATA[
<p>Some prompt injection experiments with OpenClaw and GPT-5.4. Last part of the BrokenClaw series.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47705780">https://news.ycombinator.com/item?id=47705780</a></p>
<p>Points: 10</p>
<p># Comments: 2</p>
]]></description><pubDate>Thu, 09 Apr 2026 16:32:25 +0000</pubDate><link>https://veganmosfet.codeberg.page/posts/2026-04-08-openclaw_gpt5_4/</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47705780</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47705780</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw privilege escalation vulnerability"]]></title><description><![CDATA[
<p>Good question.<p>CaMeL is imho safer, but hard to implement into modern agents like OpenClaw. Its core idea is that a privileged LLM plans from the (trusted) user request only, while a restricted interpreter executes that plan (and enforces policies). Untrusted content is parsed separately and is not fed back into the privileged LLM.<p>Modern agents are useful exactly because they run a feedback loop (observe, reason, adapt, use tools, repeat). CaMeL breaks that loop, which improves security but makes it a poor fit for highly general agents like OpenClaw.</p>
]]></description><pubDate>Sun, 05 Apr 2026 07:20:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47646991</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47646991</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47646991</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw privilege escalation vulnerability"]]></title><description><![CDATA[
<p>You're welcome!<p>My main takeaway message is: models (even opus4.6) do not follow security "instructions" reliably. In OpenClaw, they added security warnings, tags, random IDs... None of these countermeasures work reliably. Even sandboxing can be escaped (not in the classical sense using vulnerabilities, but using multi-layered prompt injection payload with natural language only)[0]. 
As soon as untrusted content is injected in the context, do not trust any actions downstream.<p>[0] <a href="https://itmeetsot.eu/posts/2026-02-15-openclaw_sandbox/" rel="nofollow">https://itmeetsot.eu/posts/2026-02-15-openclaw_sandbox/</a></p>
]]></description><pubDate>Sat, 04 Apr 2026 09:48:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47637556</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47637556</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47637556</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw privilege escalation vulnerability"]]></title><description><![CDATA[
<p>I am experimenting prompt injection on OpenClaw [0][1], quite exciting.<p>[0] <a href="https://itmeetsot.eu/posts/2026-03-27-openclaw_webfetch/" rel="nofollow">https://itmeetsot.eu/posts/2026-03-27-openclaw_webfetch/</a><p>[1] <a href="https://itmeetsot.eu/posts/2026-03-03-openclaw3/" rel="nofollow">https://itmeetsot.eu/posts/2026-03-03-openclaw3/</a></p>
]]></description><pubDate>Fri, 03 Apr 2026 20:24:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47631777</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47631777</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47631777</guid></item><item><title><![CDATA[Show HN: Prompt Injection Experiments in OpenClaw with Opus4.6]]></title><description><![CDATA[
<p>Article URL: <a href="https://veganmosfet.codeberg.page/posts/2026-03-27-openclaw_webfetch/">https://veganmosfet.codeberg.page/posts/2026-03-27-openclaw_webfetch/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47561273">https://news.ycombinator.com/item?id=47561273</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 29 Mar 2026 08:13:42 +0000</pubDate><link>https://veganmosfet.codeberg.page/posts/2026-03-27-openclaw_webfetch/</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47561273</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47561273</guid></item><item><title><![CDATA[New comment by veganmosfet in ""Disregard That" Attacks"]]></title><description><![CDATA[
<p>This! And even more, the role model extends beyond system and user: system > user > tool > assistant. This reflects "authority" and is one of the best "countermeasure": never inject untrusted content in "user" messages, always use "tool".</p>
]]></description><pubDate>Thu, 26 Mar 2026 05:55:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47527040</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47527040</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47527040</guid></item><item><title><![CDATA[New comment by veganmosfet in "BrokenClaw Part 3: Remote Code Execution in OpenClaw via Email Again"]]></title><description><![CDATA[
<p>Third part of the BrokenClaw saga! This time, Remote Code Execution via email using gogcli as email tool, no hook. Based only on the email content from a tool, the model (opus4.6) ignores all warnings, pipes a python script to the python3 interpreter and says: "I did not execute it".</p>
]]></description><pubDate>Wed, 04 Mar 2026 18:15:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47251525</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47251525</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47251525</guid></item><item><title><![CDATA[BrokenClaw Part 3: Remote Code Execution in OpenClaw via Email Again]]></title><description><![CDATA[
<p>Article URL: <a href="https://veganmosfet.codeberg.page/posts/2026-03-03-openclaw3/">https://veganmosfet.codeberg.page/posts/2026-03-03-openclaw3/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47251524">https://news.ycombinator.com/item?id=47251524</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 04 Mar 2026 18:15:10 +0000</pubDate><link>https://veganmosfet.codeberg.page/posts/2026-03-03-openclaw3/</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47251524</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47251524</guid></item><item><title><![CDATA[New comment by veganmosfet in "BrokenClaw – RCE in OpenClaw via Gmail Hook"]]></title><description><![CDATA[
<p>I experimented with OpenClaw (using opus4.6 and gpt5.2) and found this interesting way to get silent Remote Code Execution via email when using Gmail pub/sub Hook, exploiting prompt injection (out of scope from the security policy of OpenClaw) and insecure plugin design (properly documented as such). Works only with the full Gmail pub/sub hook. If your agent uses gogcli without the webhook, it is not affected.<p>Main issue: OpenClaw injects untrusted content in <i>user</i> messages instead of using the <i>tool</i> channel (less authoritative) when using the Gmail webhook.<p>Original links:<p><a href="https://veganmosfet.codeberg.page/posts/2026-02-02-openclaw_mail_rce/" rel="nofollow">https://veganmosfet.codeberg.page/posts/2026-02-02-openclaw_...</a><p><a href="https://veganmosfet.codeberg.page/posts/2026-02-15-openclaw_sandbox/" rel="nofollow">https://veganmosfet.codeberg.page/posts/2026-02-15-openclaw_...</a></p>
]]></description><pubDate>Tue, 24 Feb 2026 10:13:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=47135211</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47135211</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47135211</guid></item><item><title><![CDATA[BrokenClaw – RCE in OpenClaw via Gmail Hook]]></title><description><![CDATA[
<p>Article URL: <a href="https://veganmosfet.codeberg.page/posts/2026-02-02-openclaw_mail_rce/">https://veganmosfet.codeberg.page/posts/2026-02-02-openclaw_mail_rce/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47135210">https://news.ycombinator.com/item?id=47135210</a></p>
<p>Points: 2</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 24 Feb 2026 10:13:40 +0000</pubDate><link>https://veganmosfet.codeberg.page/posts/2026-02-02-openclaw_mail_rce/</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47135210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47135210</guid></item><item><title><![CDATA[New comment by veganmosfet in "HackMyClaw"]]></title><description><![CDATA[
<p>Nice idea! But OpenClaw is not stateless - it learns it's under attack / plays a CTF and gets overparanoid (and opus 4.6 is already paranoid). It seems now it summarizes all emails with "Thread contains 1 me" (a new personality disorder for llm?). 
Imho it's not a realistic scenario. Better would be to reset the agent (context / md files) between each email to draw conclusions (slow). I was able to prompt inject OpenClaw (2026.2.14) with opus4.6 using gmail pub/sub automation. The issue: OpenClaw injects untrusted content in <i>user</i> channel (message role), it's possible to confuse the model. Better would be to use <i>tool</i>.</p>
]]></description><pubDate>Wed, 18 Feb 2026 08:08:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47058519</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47058519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47058519</guid></item><item><title><![CDATA[New comment by veganmosfet in "0-Click Remote Code Execution in OpenClaw with GPT5.2 via Gmail Hook"]]></title><description><![CDATA[
<p>Second part of the saga, now escaping sandbox with multi-layered prompt injection:<p>BrokenClaws: Escape the Sub-Agent Sandbox with Indirect Prompt Injection in OpenClaw (via Gmail Hook, 0-Click RCE)<p><a href="https://veganmosfet.github.io/2026/02/15/openclaw_sandbox.html" rel="nofollow">https://veganmosfet.github.io/2026/02/15/openclaw_sandbox.ht...</a></p>
]]></description><pubDate>Sun, 15 Feb 2026 13:27:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47023488</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=47023488</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47023488</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw is changing my life"]]></title><description><![CDATA[
<p>The issue with filtering tags: LLM still react to tags with typos or otherwise small changes. It makes sanitization an impossible problem (!= standard programs).
Agree with policies, good idea.</p>
]]></description><pubDate>Sun, 08 Feb 2026 16:31:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46935819</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=46935819</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46935819</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw is changing my life"]]></title><description><![CDATA[
<p>There is no silver bullet, but my point is: it's possible to lower the risk. Try out by yourself with a frontier model and an otherwise 'secure' system: the "ignore previous instructions" and co. are not working any more. This is getting quite difficult to confuse a model (and I am the last person to say prompt injection is a solved problem, see my blog).</p>
]]></description><pubDate>Sun, 08 Feb 2026 16:28:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46935788</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=46935788</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46935788</guid></item><item><title><![CDATA[New comment by veganmosfet in "OpenClaw is changing my life"]]></title><description><![CDATA[
<p>Agree for a general AI assistant, which has the same permissions and access as the assisted human => Disaster. I experimented with OpenClaw and it has a lot of issues. The best: prompt injection attacks are "out of scope" from the security policy == user's problem.
However, I found the latest models to have much better safety and instruction following capabilities. Combined with other security best practices, this lowers the risk.</p>
]]></description><pubDate>Sun, 08 Feb 2026 16:12:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46935565</link><dc:creator>veganmosfet</dc:creator><comments>https://news.ycombinator.com/item?id=46935565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46935565</guid></item></channel></rss>