<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: dfeng</title><link>https://news.ycombinator.com/user?id=dfeng</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 07 Apr 2026 10:19:32 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=dfeng" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by dfeng in "Ingesting PDFs and why Gemini 2.0 changes everything"]]></title><description><![CDATA[
<p>Traditional OCRs are trained for a single task: recognize characters. They do this through visual features (and sometimes there's an implicit (or even explicit) "language" model: see <a href="https://arxiv.org/abs/1805.09441" rel="nofollow">https://arxiv.org/abs/1805.09441</a>). As such, the extent of their "hallucination", or errors, is when there's ambiguity in characters, e.g. 0 vs O (that's where the implicit language model comes in). Because they're trained with a singular purpose, you would expect their confidence scores (i.e. logprobs) to be well calibrated. Also, depending on the OCR model, you usually do a text detection (get bounding boxes) followed by a text recognition (read the characters), and so it's fairly local (you're only dealing with a small crop).<p>On the other hand, these VLMs are very generic models – yes, they're trained on OCR tasks, but also a dozen of other tasks. As such, they're really good OCR models, but they tend to be not as well calibrated. We use VLMs at work (Qwen2-VL to be specific), and we don't find it hallucinates that often, but we're not dealing with long documents. I would assume that as you're dealing with a larger set of documents, you have a much larger context, which increases the chances of the model getting confused and hallucinating.</p>
]]></description><pubDate>Thu, 06 Feb 2025 11:23:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=42961315</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=42961315</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42961315</guid></item><item><title><![CDATA[New comment by dfeng in "Math from Three to Seven"]]></title><description><![CDATA[
<p>I would love to hear more about your experiences running a maths circle here in the UK. My two daughters are a little young (3.5 and 0.5), but this article has inspired me to get the ball rolling.</p>
]]></description><pubDate>Thu, 03 Oct 2024 08:26:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=41728491</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=41728491</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41728491</guid></item><item><title><![CDATA[New comment by dfeng in "What happened to BERT and T5?"]]></title><description><![CDATA[
<p>I think it's because most of the compute comes from the decoding, since you're doing it autoregressively, while the encoder you just feed it through once and get the embedding. So really all it's saying is that the decoder, with N parameters, is the compute bottleneck; hence encoder-decoder with N+N has similar order compute cost as decoder with N.</p>
]]></description><pubDate>Mon, 22 Jul 2024 10:17:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=41032728</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=41032728</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41032728</guid></item><item><title><![CDATA[New comment by dfeng in "Do not confuse a random variable with its distribution"]]></title><description><![CDATA[
<p>Wow, I did not expect to see David's notation here on HN. The only problem with the notation is that it becomes so second nature that you forget it's not standard!</p>
]]></description><pubDate>Thu, 27 Jun 2024 10:49:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=40809125</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=40809125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40809125</guid></item><item><title><![CDATA[New comment by dfeng in "Ask HN: Who is hiring? (December 2022)"]]></title><description><![CDATA[
<p>SS&C Blue Prism | Machine Learning Research Engineer | Remote (UK)<p>SS&C Blue Prism allows organizations to deliver transformational business value via our intelligent automation platform. We make products with one aim in mind - to improve experiences for people. By connecting people and digital workers you can use the right resource, every time, for the best customer and business outcomes. We supply enterprise-wide software that not only provides full control and governance, but also allows businesses to react fast to continuous change.<p>---<p>We are looking for talented and driven individuals who are passionate about developing new technology to join us as a Machine Learning Research Engineer as part of the AI Labs team. We are developing a new way of Robotic Process Automation (RPA) for GUIs based on machine learning - completely developed in house and driven by the R&D team.<p>Apply here: <a href="https://wrkbl.ink/dXtD9ud" rel="nofollow">https://wrkbl.ink/dXtD9ud</a></p>
]]></description><pubDate>Thu, 01 Dec 2022 17:28:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=33819462</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=33819462</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33819462</guid></item><item><title><![CDATA[New comment by dfeng in "Random walk in 2 lines of J"]]></title><description><![CDATA[
<p>You're going from {0,1} to {-1,1} using indices. Seems easier to just transform via (x*2-1).</p>
]]></description><pubDate>Sun, 16 Oct 2022 20:19:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=33226845</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=33226845</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33226845</guid></item><item><title><![CDATA[New comment by dfeng in "Ask HN: Have you lost your passion for Mozilla Firefox?"]]></title><description><![CDATA[
<p>the only thing keeping me from moving completely from Firefox to Chrome is Pentadactyl (<a href="http://dactyl.sourceforge.net/pentadactyl/" rel="nofollow">http://dactyl.sourceforge.net/pentadactyl/</a>), and better greasemonkeyscript-ability. Firefox on Mac has always had memory hogging issues, but I really can't live without my vim overlay (I've tried the Chrome alternatives, like vimium, but they pale in comparison)<p>As people have said about Firefox's hackability, I don't think there will be a Chrome extension as good as Pentadactyl.</p>
]]></description><pubDate>Thu, 26 May 2011 00:53:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=2586198</link><dc:creator>dfeng</dc:creator><comments>https://news.ycombinator.com/item?id=2586198</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=2586198</guid></item></channel></rss>