<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: whakim</title><link>https://news.ycombinator.com/user?id=whakim</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 27 Jun 2026 12:54:14 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=whakim" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by whakim in "What is it like to be a bat? (1974) [pdf]"]]></title><description><![CDATA[
<p>> But as the environment of the organism gets more complex, reflexive avoidance behavior isn't sufficient for competence. For an agent in a complex environment, competent damage avoidance requires engaging with negative valence as a cognitive entity to be planned around and weighed against other interests. This requires unification and consciousness.<p>But <i>why</i> does engaging with negative valence, planning, and weighing actions against other interests require subjective experience? That sounds simply like a mathematical function (perhaps using our own past experiences as inputs). Reinforcement Learning is a great counterexample here: AI systems weigh negative valence and execute long-term plans without any qualia.<p>If thermoregulation is too "reflexive" for you, consider that there are many examples in which humans are able to perform very complex tasks in the absence of qualia. Consider, for instance, the phenomena of highway hypnosis, blindsight or sleepwalking - humans can do incredibly complicated things without qualia.<p>> This isn't an example of coherent behavior in the sense being used here. The issue is one of voluntary behavior being coherently executed as to achieve some goal without undermining itself.<p>This argument is circular. The original claim is that behaving coherently in a a complex environment requires consciousness. By shifting the goalposts to say that only <i>voluntary</i> behaviors qualify, you are begging the question. The entire notion of "voluntary" implies conscious intent, so your argument has become "consciously willed behaviors require consciousness".</p>
]]></description><pubDate>Thu, 11 Jun 2026 15:27:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48491679</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48491679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48491679</guid></item><item><title><![CDATA[New comment by whakim in "What is it like to be a bat? (1974) [pdf]"]]></title><description><![CDATA[
<p>> For example, phenomenal pain brings with it competence at protecting bodily integrity. The memory of pain becomes part of the explanatory narrative for the monitoring function that tracks progress towards goals ensuring coherent behavior (imagine being fearful of a stove but not knowing why).<p>But this isn't true! It has been repeatedly shown that patients without inner brain function react to stimuli (such as being pinched or pricked with a needle) by recoiling from the pain, as do babies with no experience of pain. So qualia and consciousness seem like they have nothing to do with ensuring coherent behavior. To put this another way, your experiences and interactions with the world could be sufficient to associate the stove with danger, but how does that explain why the experience of touching the stove has qualia, as opposed to simply the pain-reaction of a patient without inner brain function or a baby?<p>Another counterargument is that our brains carry out lots of "coherent" functions "in the dark". Consider, for example, thermoregulation; most of the time, there is no conscious experience associated with it, but yet it is happening constantly and coherently.<p>Let's simplify it further: to use a famous example, do you believe that a thermostat is conscious? After all, a theremostat is able to coherently regulate its temperature over time in response to changes in its environment.</p>
]]></description><pubDate>Thu, 11 Jun 2026 13:08:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=48489860</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48489860</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48489860</guid></item><item><title><![CDATA[New comment by whakim in "What is it like to be a bat? (1974) [pdf]"]]></title><description><![CDATA[
<p>Sure. We can't <i>prove</i> that other organisms experience qualia; we can only look at the effects of qualia (e.g. behaviors that are likely to be the product of emotions) and assume that an organism is therefore conscious. The real point, though, is that suggesting language gives rise to consciousness lacks any explanatory power as to <i>why</i> language should be accompanied by consciousness.</p>
]]></description><pubDate>Thu, 11 Jun 2026 12:55:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48489703</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48489703</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48489703</guid></item><item><title><![CDATA[New comment by whakim in "What is it like to be a bat? (1974) [pdf]"]]></title><description><![CDATA[
<p>> The recognition of oneself as situated in the world is crucial to coherent engagement with the world. It is how an entity can ensure its body parts are moving towards the same goal. It's how behavior over time doesn't undermine its purpose. Fragmented, incoherent behavior does not serve self-preservation.<p>Why would movement towards a goal be incoherent if it happened "in the dark"? Our brains perform many critical functions "in the dark" (and do so coherently) which do not rise to the level of consciousness.</p>
]]></description><pubDate>Thu, 11 Jun 2026 05:12:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48486465</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48486465</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48486465</guid></item><item><title><![CDATA[New comment by whakim in "What is it like to be a bat? (1974) [pdf]"]]></title><description><![CDATA[
<p>Isn't Maturana's theory that consciousness has to do with language, and the use of language to make distinctions about ourselves and others? To me, this seems clearly insufficient to explain consciousness - qualia totally precede language; one could experience qualia without language, etc.</p>
]]></description><pubDate>Thu, 11 Jun 2026 05:03:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=48486426</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48486426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48486426</guid></item><item><title><![CDATA[New comment by whakim in "The just-say-no engineer was a ZIRP phenomenon"]]></title><description><![CDATA[
<p>What a wild take. The straightforward takeaway from the end of ZIRP and the resulting increase in focus would be that you need to say no to <i>more</i> things, not fewer? You have to really contort yourself to argue that actually ZIRP gave rise to an entire class of make-work which then gave rise to a class of folks to keep said make-work under control.</p>
]]></description><pubDate>Wed, 27 May 2026 04:50:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48289785</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=48289785</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48289785</guid></item><item><title><![CDATA[Turning RAG pipelines into enterprise-grade Data Subscriptions]]></title><description><![CDATA[
<p>Article URL: <a href="https://halcyon.io/blog/machine-readable/building-the-stack">https://halcyon.io/blog/machine-readable/building-the-stack</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47794543">https://news.ycombinator.com/item?id=47794543</a></p>
<p>Points: 6</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 16 Apr 2026 15:21:20 +0000</pubDate><link>https://halcyon.io/blog/machine-readable/building-the-stack</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=47794543</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47794543</guid></item><item><title><![CDATA[New comment by whakim in "From zero to a RAG system: successes and failures"]]></title><description><![CDATA[
<p>I don't think we should undersell that transformers and semantic search are really powerful information retrieval tools, and they are extremely potent for solving search problems. That being said, I think I agree with you that RAG is fundamentally just search, and the hype (like any hype) elides the fact that you still have to solve all of the normal, difficult search problems.</p>
]]></description><pubDate>Thu, 26 Mar 2026 17:04:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47532941</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=47532941</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47532941</guid></item><item><title><![CDATA[New comment by whakim in "From zero to a RAG system: successes and failures"]]></title><description><![CDATA[
<p>I'd argue the author missed a trick here by using a fancy embedding model without any re-ranking. One of the benefits of a re-ranker (or even a series of re-rankers!) is that you can embed your documents using a really small and cheap model (this also often means smaller embeddings).</p>
]]></description><pubDate>Thu, 26 Mar 2026 14:41:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47531037</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=47531037</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47531037</guid></item><item><title><![CDATA[New comment by whakim in "From zero to a RAG system: successes and failures"]]></title><description><![CDATA[
<p>For technical domains, stuffing the context full of related-and-irrelevant or possibly-conflicting information will lead to poor results. The examples of long-context retrieval like finding a fact in a book really aren't representative of the types of context you'd be working with in a RAG scenario. In a lot of cases the problem is information organization, not retrieval, e.g. "What is the most authoritative type of source for this information?" or "How do these 100 documents about X relate to each other?"</p>
]]></description><pubDate>Thu, 26 Mar 2026 14:33:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47530956</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=47530956</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47530956</guid></item><item><title><![CDATA[New comment by whakim in "Beyond has dropped “meat” from its name and expanded its high-protein drink line"]]></title><description><![CDATA[
<p>There is no reason to believe that the foods humans have historically eaten are safer/healthier than "industrially processed/extracted/refined" food simply because we have historically eaten them. Evolution does not select for avoiding the health problems facing modern-day humans such as cancer or heart disease.</p>
]]></description><pubDate>Tue, 17 Mar 2026 05:51:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47409109</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=47409109</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47409109</guid></item><item><title><![CDATA[New comment by whakim in "Tenure Is a Total Scam (2023)"]]></title><description><![CDATA[
<p>Yes; you can phone it in post-tenure. But just because it is possible doesn't mean (in my experience) it is common; and I don't think it's helpful (as TFA claims) to equate this possibility with "a total scam." To get tenure <i>anywhere</i> doesn't just require a huge amount of work as an Assistant Professor; it also requires a huge amount of work as a PhD student and potentially multiple rounds of post-doc'ing or other non-tenure-line work. In my experience, tenured professors have spent nearly two decades distorting their work-life balance beyond all recognition to the point that grinding insanely hard in pursuit of publications just feels normal.</p>
]]></description><pubDate>Mon, 09 Feb 2026 01:12:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=46940426</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=46940426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46940426</guid></item><item><title><![CDATA[New comment by whakim in "Postgres extension complements pgvector for performance and scale"]]></title><description><![CDATA[
<p>Worth noting that the filtering implementation is quite restrictive if you want to avoid post-filtering: filters must be expressible as discrete smallints (ruling out continuous variables like timestamps or high cardinality filters like ids); filters must always be denormalized onto the table you're indexing (no filtering on attributes of parent documents, for example); and filters must be declared at index creation time (lots of time spent on expensive index builds if you want to add filters). Personally I would consider these caveats pretty big deal-breakers if the intent is scale and you do a lot of filtering.</p>
]]></description><pubDate>Wed, 31 Dec 2025 00:32:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46439970</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=46439970</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46439970</guid></item><item><title><![CDATA[New comment by whakim in "How uv got so fast"]]></title><description><![CDATA[
<p>> Most of the time you don't need a different Python version from the system one.<p>Except for literally anytime you’re collaborating with anyone, ever? I can’t even begin to imagine working on a project where folks just use whatever python version their OS happens to ship with. Do you also just ship the latest version of whatever container because most of the time nothing has changed?</p>
]]></description><pubDate>Sat, 27 Dec 2025 16:29:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=46402902</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=46402902</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46402902</guid></item><item><title><![CDATA[New comment by whakim in "Structured outputs create false confidence"]]></title><description><![CDATA[
<p>I don't really understand the point around error handling. Sure, with structured outputs you need to be explicit about what errors you're handling and how you're handling them. But if you ask the model to return pure text, you now have a universe of possible errors that you <i>still</i> need to handle explicitly (you're using structured outputs, so your LLM response is presumably being consumed programmatically?), including a whole bunch of new errors that structured outputs help you avoid.<p>Also, meta gripe: this article felt like a total bait-and-switch in that it only became clear that it was promoting a product right at the end.</p>
]]></description><pubDate>Sun, 21 Dec 2025 21:25:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=46348657</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=46348657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46348657</guid></item><item><title><![CDATA[New comment by whakim in "So you wanna build a local RAG?"]]></title><description><![CDATA[
<p>In my experience the semantic/lexical search problem is better understood as a precision/recall tradeoff. Lexical search (along with boolean operators, exact phrase matching, etc.) has very high precision at the expense of lower recall, whereas semantic search sits at a higher recall/lower precision point on the curve.</p>
]]></description><pubDate>Sat, 29 Nov 2025 02:01:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46084640</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=46084640</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46084640</guid></item><item><title><![CDATA[New comment by whakim in "Scaling HNSWs"]]></title><description><![CDATA[
<p>Doesn't this depend on your data to a large extent? In a very dense graph "far" results (in terms of the effort spent searching) that match the filters might actually be quite similar?</p>
]]></description><pubDate>Wed, 12 Nov 2025 05:52:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=45896774</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=45896774</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45896774</guid></item><item><title><![CDATA[New comment by whakim in "The Case Against PGVector"]]></title><description><![CDATA[
<p>Thanks for the reply! This makes much more sense now. To preface, I think pgvector is incredibly awesome software, and I have to give huge kudos to the folks working on it. Super cool. That being said, I do think the author isn't being unreasonable in that the limitations of pgvector are very real when you're talking indices that grow beyond millions of things, and the "just use pgvector" crowd <i>in general</i> doesn't have a lot of experience with scaling things beyond toy examples. Folks should take a hard look at what size they expect their indices to grow to in the near-to-medium-term future.</p>
]]></description><pubDate>Mon, 03 Nov 2025 22:58:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=45805496</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=45805496</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45805496</guid></item><item><title><![CDATA[New comment by whakim in "The Case Against PGVector"]]></title><description><![CDATA[
<p>> maintenance_work_mem begs to differ.<p>HNSW indices are <i>big</i>. Let's suppose I have an HNSW index which fits in a few hundred gigabytes of memory, or perhaps a few terabytes. How do I reasonably rebuild this using maintenance_work_mem? Double the size of my database for a week? What about the knock-on impacts on the performance for the rest of my database-stuff - presumably I'm relying on this memory for shared_buffers and caching? This seems like the type of workload that is being discussed here, not a toy 20GB index or something.<p>> You use REINDEX CONCURRENTLY.<p>Even with a bunch of worker processes, how do I do this within a reasonable timeframe?<p>> How do you think a B+tree gets updated?<p>Sure, the computational complexity of insertion into an HNSW index is sublinear, the constant factors are significant and do actually add up. That being said, I do find this the weakest of the author's arguments.</p>
]]></description><pubDate>Mon, 03 Nov 2025 22:42:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=45805356</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=45805356</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45805356</guid></item><item><title><![CDATA[New comment by whakim in "The Case Against PGVector"]]></title><description><![CDATA[
<p>Interested to hear more about your experience here. At Halcyon, we have trillions of embeddings and found Postgres to be unsuitable at several orders of magnitude less than we currently have.<p>On the iterative scan side, how do you prevent this from becoming too computationally intensive with a restrictive pre-filter, or simply not working at all? We use Vespa, which means effectively doing a map-reduce across all of our nodes; the effective number of graph traversals to do is smaller, and the computational burden mostly involves scanning posting lists on a per-node basis. I imagine to do something similar in postgres, you'd need sharded tables, and complicated application logic to control what you're actually searching.<p>How do you deal with re-indexing and/or denormalizing metadata for filtering? Do you simply accept that it'll take hours or days?<p>I agree with you, however, that vector databases are not a panacea (although they do remove a huge amount of devops work, which is worth a lot!). Vespa supports filtering across parent-child relationships (like a relational database) which means we don't have to reindex a trillion things every time we want to add a new type of filter, which with a previous vector database vendor we used took us almost a week.</p>
]]></description><pubDate>Mon, 03 Nov 2025 22:14:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45805087</link><dc:creator>whakim</dc:creator><comments>https://news.ycombinator.com/item?id=45805087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45805087</guid></item></channel></rss>