Hacker News: gajomi

New comment by gajomi in "Ask HN: Who is hiring? (December 2023)"

gajomi — Mon, 04 Dec 2023 18:16:01 +0000

I didn't see any of the specific job titles you mentioned on the career site. Perhaps this is part of the "antedisciplinary" method you refer to? Practically speaking though, should one just use the "Open Apply" option for Machine Learning Engineer/Scientist roles?

New comment by gajomi in "The worst programmer I know"

gajomi — Sun, 03 Sep 2023 05:27:39 +0000

PSA... check out this guy's LinkedIn profile. It's a great read.

New comment by gajomi in "The inflated promise of science education"

gajomi — Tue, 13 Sep 2022 03:19:18 +0000

I think that is why he asked the rhetorical "which masks?".

New comment by gajomi in "Learning how dictionaries work"

gajomi — Thu, 11 Nov 2021 16:43:18 +0000

There are a couple starting points you could take. I spent a weekend hacking out a program that generates fake word/definition pairs with a transformer model set against a dictionary: https://youtu.be/XnJ2TKAn-Vk?t=1547. If you substitute fake words for real words and have a sufficiently accurate model you could quickly generate reasonable and novel definitions.

There are more complete versions of this kind of thing publicly available: https://github.com/turtlesoupy/this-word-does-not-exist

> This would be amazing, for example, to run on a large corpus, generate the dictionary, and then run it again to find words that are used but not defined - not just in the original corpus but in the definitions too.

I think this would be how you would gauge success of the model. That is to say, you would evaluate model accuracy on a set of held-out words with definitions that never appeared in your dictionary training set but appeared in context in your corpus. You would have to manually annotate whether or not the generated definition of these held out words was acceptable.

New comment by gajomi in "The Nobel Prize in Physics 2021"

gajomi — Tue, 05 Oct 2021 16:32:10 +0000

Let me take a stab at this (I'll maybe take it halfway there). First of all we want to know what kind of matrix we are talking about.

Imagine that you have a whole bunch generative models (its best if you imagine a fully connected Boltzmann machine in particular, whose states you can think of as a binary vector consisting only of zeros and ones) that have the same form but different random realizations of their parameters. This is a typical example of what a toy model of a so-called "spin glass" looks like in statistical physics (the spins are either up down down, usually represented as +1/-1). Each of these models, having been initialized randomly will have their up particular frequency of a particular location (also called site) of the boolean vector being either a one or a zero.

If the tendency of a site to be either or one or a zero was independent of every other site the analysis of such a model would be pretty straightforward: every model would just have a vector of N frequencies and we could compare how close the statistical behavior of each model was to the other by comparing how closely the N frequencies at each site matched one another. But in the general case there will be some interaction or correlation between sites in a given model. If the interaction strength is strong enough this can result in a model tending to generate groups of patterns in its sequence of zeros and ones that are close to one another. Furthermore if we compare the overlap of the apparent patterns between two such models, each with their own random parameters, we will find that some of them overlap more than others.

What we can then do is to ask the question of how much, on average do the patterns of these random models overlap with on another in the full set of all models. This leads us to the concept of an "overlap matrix". This matrix will have one set of values along the diagonal (corresponding to how much a models patterns tend to overlap with themselves) and off diagonal values capturing the overlap between. You can find through simulation or with some carefully constructed calculations that when the interaction strength between sites is small that the off diagonal elements don't tend to zero, but rather a single number different from the diagonal value. This is perhaps intuitive: these models were randomly initialized but they are going to overlap in their behavior in some places.

Where things get interesting though is when you increase the interaction strength you find that the overlap matrix starts to take on a block diagonal form, wherein clusters of models overlap with one another at a certain level and at a lower but constant level with out-of-cluster models. This is called one replica symmetry breaking (1RSB). These different clusters of models can be thought of as having learned different overall patterns with the similarity quantified by their overalp. If you keep increasing the interaction strength you will find that this happens again and again, with a k-fold replica symmetry braking (kRSB) with a sort of self similar block structure emerging in the overlap matrix (picture is worth a thousand words [1]).

Now the real wild part that Parisi figured out is what happens when you take this process to the regime of full replica symmetry breaking. You can't really do this with simulations and the calculations are very tricky (you have a bunch of terms either going to infinity or zero that need to balance out correctly) but Parisi ending up coming up with an expression for the distribution of overlaps for the infinitely sized matrix with full interaction strength in play. The expression is actually a partial differential equation that itself needs to be solved (I told you the calculations were tricky right), but amazingly, it seems to capture the behavior of these kinds of models correctly.

Whereas mathematicians have a pretty good idea of how to understand the 1RSB process rigorously, the Parisi full replica symmetry breaking scheme is very much not understood and remains of interest both to complex systems researches trying to understand their models and applied mathematicians (probability people in particular) trying to lay the foundations needed to explore the ideas being explored by theorists.

Hope that helps a bit!

[1] https://www.semanticscholar.org/paper/Spin-Glasses%2C-Boolea...

New comment by gajomi in "The Nobel Prize in Physics 2021"

gajomi — Tue, 05 Oct 2021 15:22:52 +0000

+1 on the whole references as a nice introduction. I think the authors overstate the preparation of their hypothetical "pedestrian" (either that or they need to get away from the physics department a bit more often), but a great reference nevertheless. I also got a lot out of sections of Nishimori's textbook [1]. In particular it helps motivate problems outside of physics and provides some references to start digging into more rigorous approaches via cavity methods (which I think, incidentally, are also more intuitive). I am a novice in this area but am sort of crossing my fingers that some of the ideas in this area will make their way into algorithms for inferring latent variables in some of the upcoming modern associative neural networks. What I mean here is that it would be cool not just to have an understanding of total capacity of the network but also correct variational approximations to use during training.

[1] https://neurophys.biomedicale.parisdescartes.fr/wp-content/u... [2] https://ml-jku.github.io/hopfield-layers/

New comment by gajomi in "Software Dev Can't Be Automated – It’s a Creative Process with an Unknown Goal"

gajomi — Sun, 15 Aug 2021 06:43:08 +0000

I have been using the phrase "Artificial Stupidity" as well, but with the opposite meaning. Specifically I like to think of human-like artificial stupidity as a challenge for machine intelligence, in which an algorithm is able to replicate the rather sophisticated and incredibly entangled logic, intuitions and calculus of humans at the height of their stupidity. This seems to me a much greater challenge than the standard sort of supervised learning problems in that a truly stupid AI must be able to imagine latent variables that allow it to explain away real world observations in a way that is both statistically implausible but casually serendipitous to their stupid peers. This seems to me to be a requirement for any kind of useful AGI.

New comment by gajomi in "Fun and dystopia with AI-based code generation using GPT-J-6B"

gajomi — Fri, 25 Jun 2021 00:43:41 +0000

This would be the opposite of a Turing test though, since most people wouldn't be able to do this.

Space station detectors found the source of weird ‘blue jet’

gajomi — Fri, 22 Jan 2021 16:24:10 +0000

Article URL: https://www.sciencenews.org/article/space-station-detectors-found-source-weird-blue-jet-lightning/

Comments URL: https://news.ycombinator.com/item?id=25873006

Points: 1

# Comments: 0

New comment by gajomi in "TreeCard: The wooden debit card that plants trees"

gajomi — Sun, 03 Jan 2021 17:19:13 +0000

I can imagine a variant of this card that tries to tackle the point you bring up. Not all purchases are equal. Some of them have negative carbon footprints. The tree planting offsets the footprint for every purchase. These things are not easy to estimate exactly but since there are probably order of magnitude differences in carbon footprints for different products there should be useful information about the relative impact of a purchase to display to the consumer and to demonstrate the overall impact of the project.

New comment by gajomi in "Reinforcement learning is supervised learning on optimized data"

gajomi — Wed, 14 Oct 2020 04:43:54 +0000

It seems to me that they are basically describing a variational formulation of the "optimization perspective" of reinforcement learning, which is cool, but I am confused... where is the supervised learning? Like what is the input and what is the output?

New comment by gajomi in "Julia 1.5 Highlights"

gajomi — Mon, 03 Aug 2020 17:37:05 +0000

FYI this is the _first_ call which is dominated by the JIT process. So not really "this time to plot" so much as the "time to compile". Calls to the resulting jitted function will be on the order of what you quote (I have not timed myself for the specific result they quote here but have done for similar in the past).

New comment by gajomi in "Teaching physics to neural networks removes 'chaos blindness'"

gajomi — Tue, 23 Jun 2020 22:15:17 +0000

> it sounds like they've increased the accuracy of a neural network model of a system, notably for edge cases, by training it on complete a complete model of said system.

Not quite. It's really just that they require the dynamics to be Hamiltonian, which would be highly atypical of the kind of dynamics an otherwise unconstrained neural network would learn. This is reflected in their loss function, the first of which learn an arbitrary second order differential equation, the second of which enforces Hamiltonian dynamics.

I don't understand how this was considered novel enough to warrant at PRE paper.

Here is a link to the paper:

https://journals.aps.org/pre/pdf/10.1103/PhysRevE.101.062207

New comment by gajomi in "Malaria 'Completely Stopped' by Microbe"

gajomi — Mon, 04 May 2020 17:29:34 +0000

> we shouldn’t just be ranking people by age, if we want to come to moral outcomes?

Agreed. But I think quantifications can still be useful.

Person-years probably aren't the right sort of quantification though, in that they blend together too many kinds of human experience. But I imagine most people would agree that statements like "X% of grandparents, lost 10 years earlier than expected" or "Y% of children under 10" allow for comparing the relative impact on families.

Quantifying losses doesn't mean that we consider the quantified as fungible.

On the first sequence without triple in arithmetic progression

gajomi — Wed, 11 Mar 2020 03:10:13 +0000

Article URL: https://mathoverflow.net/questions/338415/on-the-first-sequence-without-triple-in-arithmetic-progression

Comments URL: https://news.ycombinator.com/item?id=22542596

Points: 2

# Comments: 0

New comment by gajomi in "StyleGAN2"

gajomi — Fri, 13 Dec 2019 17:07:31 +0000

How is it speculative? The background does not always allow for the discrimination to be made but in the random sample of ~20 faces I looked it was the main factor in maybe 1/4 of the cases (I was %100 accurate on this random sample). Of course, my random sample is not your random sample. We could probably do a controlled experiment to get at these kinds of attributions systematically.

I might also say that a second major discriminating factor is skin texture, especially at boundaries.

New comment by gajomi in "Eigenvectors from eigenvalues"

gajomi — Fri, 15 Nov 2019 18:30:34 +0000

Came here to pretty much write these same notes. The fact that you need to know the spectrum of all the minors really limits the usefulness for solving general eigen* problems. But I can imagine that this could make a great starting point for all kinds of approximate calculations/asymptotic analyses.

Regarding the first caveat that you bring up though, whereas the problem statement says you need a Hermitian matrix I think the results should generalize to non-hermitian matrices. In particular take a look at the third proof of the theorem at the end of the article. The only assumption required here is that the eigenvalues are simple (which does not preclude them being complex/coming in complex pairs).

Protip: I had to read the second step in the proof a few times before I could see what was going on. Explicitly what you do here is to (i) multiple from the right by the elementary unit vector e_j (ii) set up the matrix vector inverse using Cramer's rule (iii) notice that this matrix can be permuted to have a block diagonal element equal to the minor M_i with the other block diagonal entry equal to 1 and the corresponding column equal filled with zeros (iv) Write the determinant in the block form, which simplifies to the expression involving M_i by itself then finally (v) multiply by e_j from the left to get scalar equation.

New comment by gajomi in "Stegasuras: Neural Linguistic Steganography"

gajomi — Thu, 05 Sep 2019 20:03:17 +0000

FYI you need to type something into the "Secrete message" and ask to "Encrypt" using the prepopulated "LM context".

Both the context and secrete message change the encrypted message

New comment by gajomi in "Ask HN: Best resources to gain math intuition?"

gajomi — Tue, 27 Aug 2019 00:27:06 +0000

It's probably too hard to answer the general question about "how to gain math intuition". Mathematics is just too vast.

However, you have written

> I think when I took physics it really brought out these flaws and lack of intuition

which suggests you have some good practical experience with physics problem solving which has precipitated a certain feeling that you need to learn more about some kind of math. I would advise that you try to exploit this. In the same breath I want to recognize (as someone who did their BS and MS in physics) that physicists are not always so careful or explicit in how they are doing their mathematics. So learning means eventually going beyond physics sources and into a much wider world of mathematical thought. The particular things that mathematicians care about may or may not be relevant to the problem you are trying to solve in physics, and a good part of developing that intuition is to figure out which particular caveats that a mathematician expounds upon (more often than not, some esoterica about the space(s) that they are working in or the class of isomorphisms under which their results are invariant) matter physically. As you develop and intuition about these things a bonus is that you will be able to skim through mathematics resources much faster.

New comment by gajomi in "Modern SAT solvers: fast, neat and underused"

gajomi — Sun, 19 May 2019 16:25:33 +0000

Stupid question... Are there equivalent (in the sense of being fast and having nice interfaces) solvers for #SAT problems? A whole pile of inference in machine learning can be framed in this way which makes me curious.