<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jwyang</title><link>https://news.ycombinator.com/user?id=jwyang</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 01 May 2026 06:14:39 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jwyang" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jwyang in "Magma: A foundation model for multimodal AI agents"]]></title><description><![CDATA[
<p>very good question, now we are mainly focusing on building the foundtion for multimodal perception and atomic action taking. Of course, integrating the trace-of-mark prediction for robotics and human video data enhances the model's medium length reasoning but this is not sufficient for sure. The current Magma model will serve as the basis for our next step, i.e., longer horizong reasoning and planning! We are exactly looking at this part for our next version of Magma!</p>
]]></description><pubDate>Thu, 20 Feb 2025 08:04:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43112220</link><dc:creator>jwyang</dc:creator><comments>https://news.ycombinator.com/item?id=43112220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43112220</guid></item><item><title><![CDATA[New comment by jwyang in "Magma: A foundation model for multimodal AI agents"]]></title><description><![CDATA[
<p>ops, a typo, no M from Microsoft.</p>
]]></description><pubDate>Thu, 20 Feb 2025 07:34:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=43112073</link><dc:creator>jwyang</dc:creator><comments>https://news.ycombinator.com/item?id=43112073</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43112073</guid></item><item><title><![CDATA[New comment by jwyang in "Magma: A foundation model for multimodal AI agents"]]></title><description><![CDATA[
<p>Thanks for your great interests on our Magma work, everyone!<p>We will gradually roll out the inference/training/evaluation/data preprocessing code on our codebase: <a href="https://github.com/microsoft/Magma">https://github.com/microsoft/Magma</a>, and this will be finished by next Tuesday. Stay tunned!</p>
]]></description><pubDate>Thu, 20 Feb 2025 06:29:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=43111748</link><dc:creator>jwyang</dc:creator><comments>https://news.ycombinator.com/item?id=43111748</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43111748</guid></item><item><title><![CDATA[New comment by jwyang in "Magma: A foundation model for multimodal AI agents"]]></title><description><![CDATA[
<p>Good catch! A minor correction: Magma - M(ultimodal) Ag(entic) M(odel) at M(icrosoft) (Rese)A(rch), the last part is similar to how the name Llama came out, :)</p>
]]></description><pubDate>Thu, 20 Feb 2025 06:26:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43111728</link><dc:creator>jwyang</dc:creator><comments>https://news.ycombinator.com/item?id=43111728</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43111728</guid></item><item><title><![CDATA[New comment by jwyang in "Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4Vision"]]></title><description><![CDATA[
<p>Here are other related links:<p>code: <a href="https://github.com/microsoft/SoM">https://github.com/microsoft/SoM</a><p>demo: <a href="https://user-images.githubusercontent.com/3894247/281586831-8f827871-7ebd-4a5e-bef5-861516c4427b.mp4" rel="nofollow noreferrer">https://user-images.githubusercontent.com/3894247/281586831-...</a></p>
]]></description><pubDate>Thu, 09 Nov 2023 19:13:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=38209521</link><dc:creator>jwyang</dc:creator><comments>https://news.ycombinator.com/item?id=38209521</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38209521</guid></item></channel></rss>