Hacker News: ashearer

New comment by ashearer in "Air Canada is responsible for chatbot's mistake: B.C. tribunal"

ashearer — Sat, 17 Feb 2024 01:56:16 +0000

On the other hand, if they had quietly offered a one-time policy exception before reaching the point of a court verdict, they would have avoided any kind of precedent.

New comment by ashearer in "Nominative determinism in hospital medicine (2015)"

ashearer — Fri, 14 Jan 2022 04:57:38 +0000

While I can't say how much effect grammar has on thought processes, the metaphors we rely on can be significant. For an example that came up earlier today (https://news.ycombinator.com/item?id=29923866), the term "sanitizing" is often used to mean escaping data. But this way of thinking appears to create a strong urge to "sanitize" as soon as possible, so that the rest of the system will only have to handle "clean" data. This leads to mistakes: data is escaped on input, and therefore tends to be wrong for all but one of several output formats (possibly resulting in security vulnerabilities). And then because data from trusted sources is implied to be "clean", it isn't escaped at all, even when it will wind up being parsed incorrectly. Discarding this metaphor could actually result in better software.

New comment by ashearer in "Don’t try to sanitize input, escape output (2020)"

ashearer — Thu, 13 Jan 2022 17:40:23 +0000

This is an example of why the term "sanitize" just brings confusion and leads to incorrect software. If we say "escape" (for concatenation) or "parameterize" (for discrete arguments) instead, then there's no confusion: we know that it should be done at the point of use, because the procedure for doing so depends on that use.

Calling it "sanitization" implies that the data is somehow dirty, so naturally it should be cleaned as soon as possible, and after that it's safe. But all that accomplishes in general is corrupting the data, often in an unrecoverable way, and then opening up security vulnerabilities because the specific use doesn't happen to exactly match the sanitization done in advance.

It's great to validate the data on input and make it conform to the correct domain of values, but conflating this with output formats and expecting this to take care of downstream security as well just leads to incorrect data along with security vulnerabilities.

PHP's long-ago-removed magic quotes feature was an example of this confusion in action. It not only mangled incoming strings containing single quotes in an effort to prevent SQL injection, but did so in a way that left some databases completely exposed, depending on their quoting syntax.

New comment by ashearer in "Ask HN: It's 2021 and QuickTime still doesn't export MP4. What is going on?"

ashearer — Thu, 12 Aug 2021 16:56:43 +0000

On Catalina at least, variable-speed constant-pitch playback is still available, just hidden. Option-click on the fast-forward or rewind buttons to access it. It jumps in increments of 0.1x.

New comment by ashearer in "Intentionally Leaking AWS Keys"

ashearer — Tue, 19 Jan 2021 18:42:38 +0000

Good to know that AWS is so fast to detect this.

If good uses were common—and I'm struggling to come up with them—AWS could suppress the alert for IAM users that were already sufficiently locked down. But since that would become dangerous if the permissions were loosened later, AWS would wind up creating two classes of keys, public and non-public, in order to know whether to warn about loosening restrictions. Simpler just to forbid making keys public.

To publish such a key anyway without having to go to the trouble of unwinding an AWS auto-quarantine, breaking it up in code (like "part1" + "part2") might be enough to foil the AWS bot. Can anyone confirm?

New comment by ashearer in "Former Uber executive charged with paying 'hush money' to conceal breach"

ashearer — Fri, 21 Aug 2020 03:37:34 +0000

> "Similarly, Uber argued that the industry at large had become more adept since 2014 at protecting private data in the cloud, and that Uber should not be judged for “what a company did then (back when the company was much smaller and the technology at issue was evolving) according to the standards that the agency thinks are appropriate now (given the current sophistication of the company and current industry best practices).” Uber made these arguments via letter in April 2017, approximately five months after the 2016 Breach."

I've been hearing this argument for decades, and every time it's been earnest but transparent blame-shifting. "The industry didn't understand security risks back then." "No one could have predicted this." The risks were well known back then by anyone who cared about risks.

New comment by ashearer in "Jpeg2png: Silky smooth JPEG decoding – no more artifacts (2016)"

ashearer — Fri, 10 Jul 2020 20:52:06 +0000

> Women can handle it in the same way they can handle a cloudy day

Going with that, then why create that cloudy day when it would take very little effort not to?

And what message does it send to actively defend creating those cloudy days?

New comment by ashearer in "Unit Testing Is Overrated"

ashearer — Thu, 09 Jul 2020 15:40:46 +0000

My comment went on to say that you don't know ahead of time exactly which tests will prove useful. So you can't just skip writing them altogether. They key point is that if you have evidence ahead of time that a whole class of tests will be less useful than another class (because they will need several rewrites to catch a similar set of bugs) that fact should inform where you spend your time.

To go with the fire alarm analogy and exaggerate a little, it would work like this: you could attempt to install and maintain small disposable fire alarms in the refrigerator as well as every closet, drawer, and pillowcase. I'm not sure if these actually exist, but let's say they do. You then have to keep buying new ones since the internal batteries frequently run out. Or, you could deploy that type mainly in higher-value areas where they're particularly useful (near the stove), and otherwise put more time and money in complete room coverage from a few larger fire alarms that feature longer-lasting batteries. Given that you have an alarm for the bedroom as a whole, you absolutely shouldn't waste effort maintaining fire alarms in each pillowcase, and the reason is precisely that they won't ever be useful.

There are side benefits you mentioned to writing unit tests, of course, like helping you write the API initially. There are other ways to get a similar effect, though, and if those provide less benefit during refactoring but you still have to pay the cost of rewriting the tests, that also lowers their expected value.

To avoid misunderstanding, I also advocate a mixture of different types of tests. My comment is that based on the observation that unit tests depending on change-prone internal APIs tend to need more frequent rewrites, that fact should lower their expected value, and therefore affect how the mixture is allocated.

New comment by ashearer in "Unit Testing Is Overrated"

ashearer — Thu, 09 Jul 2020 13:56:16 +0000

If a particular test never finds a bug in its lifetime (and isn't used as documentation either), you might as well as not have written it, and the time would be better spent on something else instead--like a new feature or a different test.

Of course, you don't know ahead of time exactly which tests will catch bugs. But given finite time, if one category of test has a higher chance of catching bugs per time spent writing it, you should spend more time writing that kind of test.

Getting back to unit tests: if they frequently need to be rewritten as part of refactoring before they ever catch a bug, the expected value of that kind of test becomes a fraction of what it would be otherwise. It tips the scales in favor of a higher-level test that would catch the same bugs without needing rewrites.

New comment by ashearer in "Apple announces it will switch to its own processors for future Macs"

ashearer — Mon, 22 Jun 2020 19:00:52 +0000

The same slide also mentioned supporting JIT translation (for x86 web browsers and Java), so Rosetta doesn't run only at installation time.

New comment by ashearer in "Don’t try to sanitize input – escape output"

ashearer — Thu, 27 Feb 2020 17:51:53 +0000

Yes, I completely agree in the above case. The JSON input has a well-defined format and input validation should reject it outright.

The issue is that when developers hear they should "reject bad input" in order to avoid vulnerabilities, they often interpret it as a call to reject any user input that isn't already known to be good. Since user inputs are often free text, like the name field, they wind up forbidding any input they hadn't specifically imagined, which doesn't align with any particular recipient's actual data requirement. It creates false-negative edge cases while only providing illusory help against vulnerabilities.

New comment by ashearer in "Don’t try to sanitize input – escape output"

ashearer — Thu, 27 Feb 2020 16:33:44 +0000

Even the Joel article makes what's arguably a mistake: he says that input from users is "unsafe" and must be escaped on output, while strings from elsewhere shouldn't. That may avoid security exploits, but it still results in incorrect output when a predefined value really does need to be escaped.

The issue isn't whether a value originated from the user. It's the units/data type, as you said, such as plain text vs. HTML.

New comment by ashearer in "Don’t try to sanitize input – escape output"

ashearer — Thu, 27 Feb 2020 16:19:15 +0000

The difference is where it's done. "Sanitizing the input" implies that it happens when the value is read, so that all uses of the value are stuck with a single result. "Escaping the output", in your example, would happen in the database or its driver, for parameterized queries. HTML output of the same value in the same request would be escaped differently within a function that builds HTML output.

New comment by ashearer in "Don’t try to sanitize input – escape output"

ashearer — Thu, 27 Feb 2020 16:06:26 +0000

This sounds good in theory, but I'll give a counterexample.

Requirement: Name input box.

Implementation: We'll sanitize the input by rejecting any characters likely to be dangerous if mishandled, like single quotes, or anything else we don't immediately imagine to be useful. If a character turns out to be needed later, that's no problem. We'll just change the list.

Security audit: Passes

Later customer complaint: I can't sign up! — J. O'Brien

Dev team: Sorry, too bad. We'd have to re-audit everything and possibly modify code to allow your last name, because there might be code somewhere that relies on the original sanitization for security. That was the point of sanitizing on input, after all. If you want to sign up, it would be easiest for us if you would just change your name.

New comment by ashearer in "Don’t try to sanitize input – escape output"

ashearer — Thu, 27 Feb 2020 15:41:19 +0000

Yes. It's the word "sanitize" itself that misleads people. It creates the mindset that input from users is dirty and must be made clean, and "clean" is "safe" to use in any context.

(I've seen the line of thought taken one step further: taking the realization that it's impractical to make strings universally safe for any context—even if you HTML entity-encode it twice, what if a recipient decodes it three times?—and concluding that security is hard and we can only approach it asymptotically, so shrugs XSS-like bugs are normal and unavoidable given finite time & budget.)

If the mindset is more like converting units, it becomes clearer. You can't concatenate HTML with a general Unicode string without converting the string to HTML first, any more than you can add inches and centimeters directly. "Cleaning" the centimeters would make no sense.

New comment by ashearer in "Clang Solves the Collatz Conjecture?"

ashearer — Tue, 05 Nov 2019 05:15:33 +0000

It looks like a general optimization for tail-recursive functions that assumes they terminate (because not terminating would be undefined behavior). The parameters to the recursive calls don't matter: Substitute other expressions or constants for `n / 2` and `3 * n + 1`, and the compiled result remains the same. So it's not Collatz-specific.

clang appears to correctly detect that `collatz` only directly defines a result for `1`, and any other input expands to yet another recursive call to `collatz` (the parameter is irrelevant). To avoid infinite recursion, `collatz` must eventually be called with the value 1, so that's what clang concludes.

New comment by ashearer in "The Risk of Dying Doing What We Love"

ashearer — Fri, 01 Nov 2019 15:34:06 +0000

Measuring "per participation decision" often makes more sense than "per participation hour" when deciding whether to do an activity. (It could be shortened to "per event", defining an "event" to be the result of one decision.)

In explaining the choice of per-hour, the author gave the example of choosing either an afternoon riding a mountain bike or an afternoon flying a sailplane. The example works because they involve about the same number of hours. But it's also the same number of decisions, so per-event works just as well there.

Per-event fixes distortions for quick activities, where durations are meaningless because they're dominated by setup time that isn't counted (or alternatively, the risk varies by multiple orders of magnitude depending on whether you count the overhead).

The chart shows summiting Everest as being 100x safer than base jumping. But if you're deciding which activity to do, it's more relevant to compare risk per-summit to risk per-jump-trip (say, 5 jumps?) or even risk per-jump, since you can calibrate the number of jumps on your trip based on your risk tolerance, but you can't do a fractional summit.

Using the author's numbers, jumping has a risk of 0.13% per jump, or 0.67% for a trip with 5 jumps. Everest has a risk of 6.5%. So in terms more relevant to decision-making, a decision to summit Everest comes with a 10x higher risk of death than a decision to go base jumping, instead of 100x lower as the chart might lead you to think.

New comment by ashearer in "Bullet charts – Avoid extraneous details in corporate presentations"

ashearer — Fri, 27 Sep 2019 17:33:05 +0000

Immediately below that, the final Excel output is unreadable: it shows the left axis scale ending at 450 and right axis scale ending at 400, with both values corresponding to the same grid line.

Below that, it says you can draw these charts in JavaScript with a element, as if browsers natively supported it.

New comment by ashearer in "Are triggers really that slow in Postgres?"

ashearer — Wed, 23 May 2018 14:13:47 +0000

One approach that works very well is to keep stored functions in separate .sql files in a directory (I use "fixtures"), and execute them all on each deployment. This should happen after triggering migrations, so that table and column dependencies are guaranteed to be present. The .sql files use CREATE OR REPLACE FUNCTION so that their execution is idempotent.

This keeps the stored functions version-controlled along with the source code, and avoids any need to hunt through migration files to find the latest definition. Adding a stored function or modifying its function body just works.

The less common operations of deleting a function or modifying its argument list do require an explicit line in a migration file, but those situations are rare (and potentially backward-incompatible, requiring extra caution regardless).

One subtlety is that a migration that adds a new table with a trigger should define an empty stub function as the trigger. This avoids duplicating code. The real function body will be loaded from the fixture immediately afterwards.

New comment by ashearer in "Leap second 2017 status"

ashearer — Tue, 03 Jan 2017 02:40:29 +0000

Then it would become impossible to store future timestamps in a database and have a stable answer to the question, "on what day will this timestamp occur?" Timestamps around midnight could flip from one day to another unpredictably depending on what version of the leap second database the formatter has. And that wouldn't just affect the leap second day, it would affect every day of every year. That would cause problems for many fields where calendar dates matter (legal and accounting, to name a couple). I expect we'd wind up avoiding the problem by saving dates internally as formatted UTC strings, and be back where we started.