Hacker News: pacbard

New comment by pacbard in "Group averages obscure how an individual's brain controls behavior: study"

pacbard — Sun, 03 May 2026 16:51:13 +0000

It seems to be a textbook example of the Simpson's paradox.

New comment by pacbard in "Moretti replication published in AER"

pacbard — Mon, 30 Mar 2026 21:58:20 +0000

This seems to be a problem with the Stata code. Disappointing but not really surprising. Stata isn't an easy language to debug and problems like this happen a lot. I wonder how much coding was done by the original author versus being coded by a research assistant.

I think that some of the biggest issues with omitted variables could have been address by using the "new" Stata interaction syntax ("new" in the sense that has been available in Stata since 10 releases ago) instead of rolling their own products and forgetting to include the main effect.

New comment by pacbard in "Building FireStriker: Making Civic Tech Free"

pacbard — Sat, 28 Mar 2026 02:26:54 +0000

Regarding public comments, I don't believe a good politician will make a snap decision at the dais following public comments. Most of them will have received the meeting agenda in advance and formed an opinion about how they are going to vote and the questions they are going to ask. If this is the case, public comment is just a waste of time for them, as they won't really get swayed by it. At most, they will mention a point that a public commenter made to support something that they were already going to do.

Emailing them privately in advance of the meeting will give them the opportunity to think about your input and, in some cases, reply and engage with you about the policy. It might not change their mind, but it will definitely help them see others' perspectives on their upcoming decision.

New comment by pacbard in "GPTZero finds 100 new hallucinations in NeurIPS 2025 accepted papers"

pacbard — Thu, 22 Jan 2026 18:33:46 +0000

The ironic part about these hallucinations is that a research paper includes a literature review because the goal of the research is to be in dialogue with prior work, to show a gap in the existing literature, and to further the knowledge that this prior work has built.

By using an LLM to fabricate citations, authors are moving away from this noble pursuit of knowledge built on the "shoulders of giants" and show that behind the curtain output volume is what really matters in modern US research communities.

New comment by pacbard in "Python is not a great language for data science"

pacbard — Tue, 25 Nov 2025 18:54:51 +0000

When you think about a data science pipeline, you really have three separate steps:

[Data Preparation] --> [Data Analysis] --> [Result Preparation]

Neither Python or R does a good job at all of these.

The original article seems to focus on challenges in using Python for data preparation/processing, mostly pointing out challenges with Pandas and "raw" Python code for data processing.

This could be solved by switching to something like duckdb and SQL to process data.

As far as data analysis, both Python and R have their own niches, depending on field. Similarly, there are other specialized languages (e.g., SAS, Matlab) that are still used for domain-specific applications.

I personally find result preparation somewhat difficult in both Python and R. Stargazer is ok for exporting regression tables but it's not really that great. Graphing is probably better in R within the ggplot universe (I'm aware of the python port).

New comment by pacbard in "Use DuckDB-WASM to query TB of data in browser"

pacbard — Fri, 31 Oct 2025 23:39:51 +0000

I set up something similar at work. But it was before the DuckLake format was available, so it just uses manually generated Parquet files saved to a bucket and a light DuckDB catalog that uses views to expose the parquet files. This lets us update the Parquet files using our ETL process and just refresh the catalog when there is a schema change.

We didn't find the frozen DuckLake setup useful for our use case. Mostly because the frozen catalog kind of doesn't make sense with the DuckLake philosophy and the cost-benefit wasn't there over a regular duckdb catalog. It also made making updates cumbersome because you need to pull the DuckLake catalog, commit the changes, and re-upload the catalog (instead of just directly updating the Parquet files). I get that we are missing the time travel part of the DuckLake, but that's not critical for us and if it becomes important, we would just roll out a PostgreSQL database to manage the catalog.

New comment by pacbard in "US Passport Power Falls to Historic Low"

pacbard — Wed, 15 Oct 2025 19:05:32 +0000

Because applying for a visa takes money, time, and a visit to the embassy.

ESTA/ETIAS gets automatically approved within a few minutes of paying for the fee (I guess this is true for 99.999% of applicants).

Very few countries allow people to just show up and cross the border. US citizens had that privilege in a lot of places, but it looks like it’s changing now.

New comment by pacbard in "Ancient Patagonian hunter-gatherers took care of their injured and disabled"

pacbard — Mon, 13 Oct 2025 15:47:01 +0000

The hunter-gatherers in the study lived in the "Late Holocene (~4000 to 250 BP)", meaning between 2000 BCE to 1825 CE. These people are separated from us by less than 150 generations. I don't believe that humans evolve that fast, so the way you think, feel, ache, and so on also applies to them. Would you leave behind your injured and disabled in their situation (which is speculated to be the result of hunting accidents)?

New comment by pacbard in "Charlie Kirk killed at event in Utah"

pacbard — Wed, 10 Sep 2025 22:19:38 +0000

Without knowing what happened, it's difficult to make the comparison between the Italian Years of Lead and what happened earlier today at Utah Valley University.

My understanding of the Italian political climate of the 60s, 70s, and 80s is that there were political groups/cells (on both the far right and far left) that organized around violent acts to further their political goals (which involved the eventual authoritarian takeover of the Italian government by either the far right or far left). For example, you can think of the Red Brigades to be akin to the Black Panthers, but with actual terrorism.

In contrast, most political violence in America has been less organized and more individual-driven (e.g., see the Oklahoma City Bombing). For better or worse, the police state in the US has been quite successful in addressing and dispersing political groups that advocate for violence as a viable means for societal change.

New comment by pacbard in "Everything is correlated (2014–23)"

pacbard — Sat, 23 Aug 2025 00:02:15 +0000

This was my take as well. At least microeconomics has moved away from large-scale observational studies and has moved into experimental and quasi-experimental studies.

While the methods alone cannot fix it all ("You can’t fix by analysis what you bungled by design" [1] after all), it gets somewhat closer to unbiased results.

[1]: https://www.degruyterbrill.com/document/doi/10.4159/97806740...

New comment by pacbard in "Researchers value null results, but struggle to publish them"

pacbard — Fri, 25 Jul 2025 18:49:24 +0000

Not all null results are created equal.

There are interesting null results that get published and are well known. For example, Card & Kruger (1994) was a null result paper showing that increasing the minimum wage has a null effect on employment rates. This result went against the common assumption that increasing wages will decrease employment at the time.

Other null results are either dirty (e.g., big standard errors) or due to process problems (e.g., experimental failure). These are more difficult to publish because it's difficult to learn anything new from these results.

The challenge is that researchers do not know if they are going to get a "good" null or a "bad" one. Most of the time, you have to invest significant effort and time into a project, only to get a null result at the end. These results are difficult to publish in most cases and can lead to the end of careers if someone is pre-tenure or lead to funding problems for anyone.

New comment by pacbard in "High-school shop students attract skilled-trades job offers"

pacbard — Sun, 11 May 2025 17:30:26 +0000

Even if Career Technical Education (CTE) classes are offered, there is a large variation in their quality. For me, the question would be whether a graduate from a CTE program is more likely to be hired and receives higher wages (initially) than a non-CTE program completer. My 2-minute Google Scholar search hasn't found anything on the topic.

At the end of the day, a 3-course sequence in a CTE pathway (which is the CA requirements for a high school CTE certificate in California) doesn't prepare you for a career in the same way as being in journalism class prepares you to be a journalist or being in theater prepares you to be an actor. Students will most likely need to pursue some form of post-secondary training (either through a community college or on-the-job) to become somewhat competent in their field.

New comment by pacbard in "The average college student today"

pacbard — Sun, 30 Mar 2025 22:07:09 +0000

The most likely explanation for this phenomenon is that there isn't a change in the population average for variable X, but that the decrease in college students' average X is due to an increase in population college going rates.

Looking at the statistics[1], the US went from a 23.2% college completion rate in 1990 to 39.2% completion rate in 2022, or a 67% increase in college degree completions. If you assume that X in the population is constant over time, mechanically you will need to enroll and graduate students from lower percentiles of X in order to increase the overall college completion rate in the whole population.

This process might be particularly acute at "lower tier" institutions that cannot compete with "top tier" institutions for top students.

[1]: https://nces.ed.gov/programs/digest/d20/tables/dt20_104.20.a...

New comment by pacbard in "Preview: Amazon S3 Tables and Lakehouse in DuckDB"

pacbard — Tue, 18 Mar 2025 22:27:35 +0000

Apache Iceberg builds an additional layer on top of Parquet files that let's you do ACID transactions, rollbacks, and schema evolution.

A Parquet file is a static file that has the whole data associated with a table. You can't insert, update, delete, etc. It's just it. It works ok if you have small tables, but it becomes unwieldy if you need to do whole-table replacements each time your data changes.

Apache Iceberg fixes this problem by adding a metadata layer on top of smaller Parquet files (at a 300,000 ft overview).

New comment by pacbard in "Reasons veterans are especially hard-hit by federal cuts"

pacbard — Sun, 09 Mar 2025 17:17:55 +0000

What's interesting to me is that these budget cuts are coming at a time when the VA is also trying to implement a new Electronic Health Record (EHR) system (Oracle CERNER) which is having substantial issues with rollouts (see the Google News page for Oracle CERNER or read the transcripts from the congressional hearings about it).

This could become a two pronged problem where you have fewer people to provide care while they are trying to implement a new EHR which results in a decrease in productivity because of the need of learning a new system.

New comment by pacbard in "The Riddle of Luigi Mangione"

pacbard — Sun, 22 Dec 2024 19:44:14 +0000

I'll take the bait.

It seems to me that going from "F1 and F2 generations of mice respond differently to the smell of acetophenone if their parents were exposed to it" to "well, human trauma is inherited and there isn't anything we can do about violent behavior" is somewhat far-fetched and smells like neo-eugenics.

New comment by pacbard in "SpawELO – small free matchmaking system for LAN parties"

pacbard — Sat, 02 Nov 2024 18:55:38 +0000

You could assign ELO at the group level. It will increase the number of possible groups by C(N, g) where N is the number of players and g is the group size. It could work if the groups are stable enough.

If you have individual ELOs, you could pre-seed the ranking using averages and then let the algorithm take over.

New comment by pacbard in "Greenwich schools to ban most cellphones, Apple Watches, Fitbits and more"

pacbard — Sun, 18 Aug 2024 17:50:45 +0000

My guess is that they had prior experience with asking students not to use cellphones and they observed students start using wearables to skirt the ban, which led them to just ban all of them.

An interesting consequence about the wearable ban is that they also banned medical electronic devices (e.g., heart monitors or blood sugar monitors) and parents are required to meet with administration to approve their use [1].

[1]: https://greenwichfreepress.com/schools/greenwich-schools-ann...

New comment by pacbard in "OpenStreetMap Is Turning 20"

pacbard — Sun, 11 Aug 2024 16:02:41 +0000

Local mapping is surprisingly difficult. I believe that the commercial products (i.e., Google Maps) are viable only because there are strong incentives for people (e.g., business owners, property owners) to submit edits as they are the main way that people search for them. Without that, you get into a limbo where you have data but it's not the most updated one.

By the way, not even government agencies have good geo data, even when they should. I needed up to date address information for work, so I bought a map from my local county assessor's office. In my mind, the assessor should have the most recent data on properties, as their main mission is to collect taxes annually. I was wrong. Their data is about 4 to 5 years wrong, with whole "new" subdivisions missing from their inventory. Google Maps kind of has them on the map; I believe that their geolocation data comes from real estate platforms when new houses are on the market. OSM is about 10 years behind in my area. I am submitting edits as I find them.

If someone has a better idea on where to find address data, please let me know.

New comment by pacbard in "A User’s Guide to Statistical Inference and Regression"

pacbard — Sun, 28 Jul 2024 01:02:10 +0000

If you want to take this a step further, quantitative methods are about efficient data reduction. As part of this data reduction, the model’s assumptions and mathematical form take center stage in describing how you got to your “number”.

This is different from qualitative analysis because the data reduction is done “by hand” by the researcher.

The difference between the “automatic”, model-based data reduction in quantitative research and the “subjective” reduction in qualitative research is then amplified when people say that quant is more objective than qual analysis. The discussion, instead, should be about the quality of the work and whether the final conclusions are warranted by the methods instead of the method itself.