Hacker News: dfeng

New comment by dfeng in "Ingesting PDFs and why Gemini 2.0 changes everything"

dfeng — Thu, 06 Feb 2025 11:23:01 +0000

Traditional OCRs are trained for a single task: recognize characters. They do this through visual features (and sometimes there's an implicit (or even explicit) "language" model: see https://arxiv.org/abs/1805.09441). As such, the extent of their "hallucination", or errors, is when there's ambiguity in characters, e.g. 0 vs O (that's where the implicit language model comes in). Because they're trained with a singular purpose, you would expect their confidence scores (i.e. logprobs) to be well calibrated. Also, depending on the OCR model, you usually do a text detection (get bounding boxes) followed by a text recognition (read the characters), and so it's fairly local (you're only dealing with a small crop).

On the other hand, these VLMs are very generic models – yes, they're trained on OCR tasks, but also a dozen of other tasks. As such, they're really good OCR models, but they tend to be not as well calibrated. We use VLMs at work (Qwen2-VL to be specific), and we don't find it hallucinates that often, but we're not dealing with long documents. I would assume that as you're dealing with a larger set of documents, you have a much larger context, which increases the chances of the model getting confused and hallucinating.

New comment by dfeng in "Math from Three to Seven"

dfeng — Thu, 03 Oct 2024 08:26:52 +0000

I would love to hear more about your experiences running a maths circle here in the UK. My two daughters are a little young (3.5 and 0.5), but this article has inspired me to get the ball rolling.

New comment by dfeng in "What happened to BERT and T5?"

dfeng — Mon, 22 Jul 2024 10:17:10 +0000

I think it's because most of the compute comes from the decoding, since you're doing it autoregressively, while the encoder you just feed it through once and get the embedding. So really all it's saying is that the decoder, with N parameters, is the compute bottleneck; hence encoder-decoder with N+N has similar order compute cost as decoder with N.

New comment by dfeng in "Do not confuse a random variable with its distribution"

dfeng — Thu, 27 Jun 2024 10:49:33 +0000

Wow, I did not expect to see David's notation here on HN. The only problem with the notation is that it becomes so second nature that you forget it's not standard!

New comment by dfeng in "Ask HN: Who is hiring? (December 2022)"

dfeng — Thu, 01 Dec 2022 17:28:33 +0000

SS&C Blue Prism | Machine Learning Research Engineer | Remote (UK)

SS&C Blue Prism allows organizations to deliver transformational business value via our intelligent automation platform. We make products with one aim in mind - to improve experiences for people. By connecting people and digital workers you can use the right resource, every time, for the best customer and business outcomes. We supply enterprise-wide software that not only provides full control and governance, but also allows businesses to react fast to continuous change.

---

We are looking for talented and driven individuals who are passionate about developing new technology to join us as a Machine Learning Research Engineer as part of the AI Labs team. We are developing a new way of Robotic Process Automation (RPA) for GUIs based on machine learning - completely developed in house and driven by the R&D team.

Apply here: https://wrkbl.ink/dXtD9ud

New comment by dfeng in "Random walk in 2 lines of J"

dfeng — Sun, 16 Oct 2022 20:19:33 +0000

You're going from {0,1} to {-1,1} using indices. Seems easier to just transform via (x*2-1).

New comment by dfeng in "Ask HN: Have you lost your passion for Mozilla Firefox?"

dfeng — Thu, 26 May 2011 00:53:40 +0000

the only thing keeping me from moving completely from Firefox to Chrome is Pentadactyl (http://dactyl.sourceforge.net/pentadactyl/), and better greasemonkeyscript-ability. Firefox on Mac has always had memory hogging issues, but I really can't live without my vim overlay (I've tried the Chrome alternatives, like vimium, but they pale in comparison)

As people have said about Firefox's hackability, I don't think there will be a Chrome extension as good as Pentadactyl.