<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: beckhamc</title><link>https://news.ycombinator.com/user?id=beckhamc</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 25 Apr 2026 21:49:31 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=beckhamc" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by beckhamc in "Questionable Practices in Machine Learning"]]></title><description><![CDATA[
<p>I'm not sure if this is directly mentioned in the paper, but I didn't see any mention specifically about the conflation between a validation set <i>and</i> test set. When people actually make a distinction between the two (which is seemingly not all that common nowadays), you're meant to perform model selection on the validation set, i.e. find the best HPs such that you minimise `loss(model,valid_set)`. Once you've found your most performant model according to that, you then evaluate it on the test set once, and that's your unbiased measure of generalisation error. Since the ML community (and reviewers) are obsessed with "SOTA", "novelty", and bold numbers, a table of results purely composed of test set numbers is not easily controllable (when you're trying to be ethical) from the point of view of actually "passing" the peer review process. Conversely, what's easily controllable is a table full of validation set numbers: just perform extremely aggressive model validation on your model until your model gets higher numbers than everything else. Even simpler solution, why not just ditch the distinction between the valid and test set to begin with? (I'm joking, btw.) Now you see the problem.</p>
]]></description><pubDate>Sun, 06 Oct 2024 21:16:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=41760383</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41760383</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41760383</guid></item><item><title><![CDATA[Questionable Practices in Machine Learning]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2407.12220">https://arxiv.org/abs/2407.12220</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41760069">https://news.ycombinator.com/item?id=41760069</a></p>
<p>Points: 6</p>
<p># Comments: 1</p>
]]></description><pubDate>Sun, 06 Oct 2024 20:33:59 +0000</pubDate><link>https://arxiv.org/abs/2407.12220</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41760069</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41760069</guid></item><item><title><![CDATA[New comment by beckhamc in "Were RNNs all we needed?"]]></title><description><![CDATA[
<p>Your description of tanh isn't even correct, it squashes a real number to `(-1, 1)`, not "less than one".<p>You're curious about whether there is gain in parameterising activation functions and learning them instead, or rather, why it's not used much in practice. That's an interesting and curious academic question, and it seems like you're already experimenting with trying out your own kinds of activation functions. However, people in this thread (including myself) wanted to clarify some perceived misunderstandings you had about nonlinearities and "why" they are used in DNNs. Or how "squashing functions" is a misnomer because `g(x) = x/1000` doesn't introduce any nonlinearities. Yet you continue to fixate and double down on your knowledge of "what" a tanh is, and even that is incorrect.</p>
]]></description><pubDate>Fri, 04 Oct 2024 17:58:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=41743891</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41743891</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41743891</guid></item><item><title><![CDATA[New comment by beckhamc in "Were RNNs all we needed?"]]></title><description><![CDATA[
<p>Learnable parameters on activations <i>do</i> exist, look up parametric activation functions.</p>
]]></description><pubDate>Fri, 04 Oct 2024 03:38:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=41737569</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41737569</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41737569</guid></item><item><title><![CDATA[New comment by beckhamc in "Were RNNs all we needed?"]]></title><description><![CDATA[
<p>How was that person derailing the convo? Nothing says an activation function has to "squash" a number to be in some range. Leaky ReLUs for instance do `f(x) = x if x > 0 else ax` (for some coefficient `a != 0`), that doesn't squash `x` to be in any range (unless you want to be peculiar about your precise definition of what it means to squash a number). The function takes a real in `[-inf, inf]` and produces a number in `[-inf, inf]`.<p>> Sure there's a squashing function on the output to keep it in a range from 0 to 1 but that's done BECAUSE we're just adding up stuff.<p>It's not because you're "adding up stuff", there is specific mathematical or statistical reason why it is used. For neural networks it's there to stop your multi layer network collapsing to a single layer one (i.e. a linear algebra reason). You can choose whatever function you want, for hidden layers tanh generally isn't used anymore, it's usually some variant of a ReLU. In fact Leaky ReLUs are very commonly used so OP isn't changing the subject.<p>If you define a "perceptron" (`g(Wx+b)` and `W` is a `Px1` matrix) and train it as a logistic regression model then you want `g` to be sigmoid. Its purpose is to ensure that the output can be interpreted as a probability (given that use the correct statistical loss), which means squashing the number. The inverse isn't true, if I take random numbers from the internet and squash them to `[0,1]` I don't go call them probabilities.<p>> and not only is it's PRIMARY function to squash a number, that's it's ONLY function.<p>Squashing the number isn't the reason, it's the side effect. And even then, I just said that not all activation functions squash numbers.<p>> All the training does is adjust linear weights tho, like I said.<p>Not sure what your point is. What is a "linear weight"?<p>We call layers of the form `g(Wx+b)` "linear" layers but that's an abused term, if g() is non-linear then the output is not linear. Who cares if the inner term `Wx + b` is linear? With enough of these layers you can approximate fairly complicated functions. If you're arguing as to whether there is a better fundamental building block then that is another discussion.</p>
]]></description><pubDate>Fri, 04 Oct 2024 01:16:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=41736838</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41736838</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41736838</guid></item><item><title><![CDATA[New comment by beckhamc in "Ask HN: Former gifted children with hard lives, how did you turn out?"]]></title><description><![CDATA[
<p>ACE 7, recently got my PhD in ML and am doing reasonably well, at least from an 'objective' lens (I have a well paying job and bought my first condo recently). My mind is still chaotic as ever (I have a mix of OCD / Tourettes / PTS) and it's still often hard for me to concentrate without overthinking some portion of my life or something completely unrelated to the task at hand.<p>If I can plug a Youtuber who really 'gets it' (mental health and depression), it's Dr Scott Eilers. Everything else is just clichéd garbage.<p>(Disclaimer: I am not 'gifted' but did very academically well during high school and uni)</p>
]]></description><pubDate>Mon, 16 Sep 2024 02:40:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=41552245</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=41552245</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41552245</guid></item><item><title><![CDATA[New comment by beckhamc in "Ask HN: Machine learning engineers, what do you do at work?"]]></title><description><![CDATA[
<p>no we just replaced feature engineering with architectural engineering</p>
]]></description><pubDate>Sat, 08 Jun 2024 17:32:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=40619059</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=40619059</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40619059</guid></item><item><title><![CDATA[New comment by beckhamc in "Why is machine learning 'hard'? (2016)"]]></title><description><![CDATA[
<p>The issue is the obsession with benchmark datasets and their flaky evaluation</p>
]]></description><pubDate>Tue, 23 Jan 2024 23:31:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=39111506</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=39111506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39111506</guid></item><item><title><![CDATA[New comment by beckhamc in "John Carmack and John Romero reunited to talk DOOM on its 30th Anniversary"]]></title><description><![CDATA[
<p>sophisticated AI != aimbot accuracy</p>
]]></description><pubDate>Mon, 11 Dec 2023 06:37:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=38598191</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=38598191</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38598191</guid></item><item><title><![CDATA[New comment by beckhamc in "Yabai – A tiling window manager for macOS"]]></title><description><![CDATA[
<p>I wish I had the ability to toggle PiP for any open window, while I am in <i>full screen</i> mode. For instance, I have both Chrome and Emacs side by side full screen and I can use a hotkey to drop down my iTerm window over both of them. (Basically like a Quake terminal but that feature is specific to iTerm).</p>
]]></description><pubDate>Thu, 30 Nov 2023 23:07:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=38480521</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=38480521</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38480521</guid></item><item><title><![CDATA[New comment by beckhamc in "Ask HN: Could you share your personal blog here?"]]></title><description><![CDATA[
<p><a href="http://beckham.nz/blog" rel="nofollow noreferrer">http://beckham.nz/blog</a><p>Mostly theoretical-ish deep learning stuff as of late (I'm a PhD candidate in that field). But I want to expand it into really anything: psychology, dating, video game reviews, etc.</p>
]]></description><pubDate>Tue, 04 Jul 2023 20:15:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=36591973</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=36591973</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36591973</guid></item><item><title><![CDATA[New comment by beckhamc in "P < 0.05 Considered Harmful"]]></title><description><![CDATA[
<p>A relevant quote here seems to be: "if a measure becomes a target, it ceases to become a good measure". Also true of academia in general because of the obsession with citations.</p>
]]></description><pubDate>Tue, 11 Apr 2023 17:26:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=35528265</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=35528265</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35528265</guid></item><item><title><![CDATA[Techniques for label conditioning in Gaussian denoising diffusion models]]></title><description><![CDATA[
<p>Article URL: <a href="https://beckham.nz/2023/01/27/ddpms_guidance.html">https://beckham.nz/2023/01/27/ddpms_guidance.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=35307751">https://news.ycombinator.com/item?id=35307751</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 25 Mar 2023 21:44:01 +0000</pubDate><link>https://beckham.nz/2023/01/27/ddpms_guidance.html</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=35307751</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35307751</guid></item><item><title><![CDATA[New comment by beckhamc in "Weka Violates MinIO's Open Source Licenses"]]></title><description><![CDATA[
<p>Not to be confused with open source ML toolkit WEKA: <a href="https://en.m.wikipedia.org/wiki/Weka_(machine_learning)" rel="nofollow">https://en.m.wikipedia.org/wiki/Weka_(machine_learning)</a><p>Huh, that's funny, I guess which one came first?</p>
]]></description><pubDate>Sat, 25 Mar 2023 16:11:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=35304076</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=35304076</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35304076</guid></item><item><title><![CDATA[New comment by beckhamc in "TabFS – a browser extension that mounts the browser tabs as a filesystem"]]></title><description><![CDATA[
<p>Nice work! Is it possible to somehow switch to a tab somehow using this? (Use case: switching browser tabs within Emacs). Thanks</p>
]]></description><pubDate>Sat, 18 Feb 2023 18:22:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=34849476</link><dc:creator>beckhamc</dc:creator><comments>https://news.ycombinator.com/item?id=34849476</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34849476</guid></item></channel></rss>