Hacker News: shadowmint

New comment by shadowmint in "An understanding of AI’s limitations is starting to sink in"

shadowmint — Mon, 15 Jun 2020 12:59:08 +0000

If the business wanted to track the rate of failures and create predictive models about when things fail, or detect anomalous behaviour, that's what they would have set out with as the goal, and, perhaps, some ML model might have helped, but probably, it would've been too unreliable and any number of standard predictive models with well known characteristics would have been used instead.

That's not what they wanted.

What people are being sold is AI/ML as a magic bullet that will do something useful regardless of the situation, and it lets business people avoid making decisions about what they actually want, because AI/ML can be anything, so they just signup for it and expect to get 20 things they didn't know they wanted handed to them on a plate.

Turn out, it's not enough to just collect a bunch of data and wave your magic wand at it. It wasn't with web analytics 10 years ago, it's still not.

What you actually need is someone who has a bunch of tricks up their sleeve, and has done this before, and can suggest a bunch of Business Insights the business might need before they start building anything, people that actually decide what to do, and actions taken to investigate, and solve those problems.

I mean, to some degree you're right; perhaps ML models could be useful for tracking hardware failures, but that's not what the parent post is talking about. The previous post was talking about just collecting the data and expecting the predictive failure models to just jump out magically.

That doesn't happen; it needs a person to have the insight that the data could be used for such a thing, and that needs to happen before you go and randomly collect all the wrong frigging metrics.

...but hiring experts is expensive, and making decisions is hard. So ML/AI is sold like snake-oil to managers who want to avoid both of those things. :)

New comment by shadowmint in "Angular v8.0"

shadowmint — Wed, 29 May 2019 13:41:30 +0000

Why does having a different opinion to you mean someone has no idea what they’re talking about, or has never used a thing “seriously“?

I used to love angular, then I got a job which was a “| async” dumpster fire and spent a year watching a team of smart c# developers wallow in a mire of disaster so bad it became a two week regression to change a text field on a form. So full of amazing functional statement no one, even the original authors, could touch it without breaking something in the process.

so.

Your milage may vary. I no longer particularly like angular, personally, because I find it a chore to herd inexperienced FactoryInjectorConstructorFactoryPattern angular developers into not screwing things up.

...but talented team can do well with it too, and I’ve seen people screw up react projects too.

It really is more about good practice and experience than framework, your personal preference is probably, like mine, basically irrelevant.

New comment by shadowmint in "Negotiations Failed: How Oracle Killed Java EE"

shadowmint — Sat, 04 May 2019 16:32:19 +0000

I spent 4 years as a professional python developer.

We certainly shipped (using django) and it was certainly slow, and remains a painfully slow very successful enterprise app.

I’m not arguing that the slowness is deal breaking, but it is slow, and it does, routinely, break the SLAs its supposed to meet.

So... unusably slow? no.

...but slow? yes, it really is.

imo. your milage may vary. /shrug

New comment by shadowmint in "Negotiations Failed: How Oracle Killed Java EE"

shadowmint — Sat, 04 May 2019 09:44:40 +0000

Lovely theory, but in practice it works more like this:

1) write everything in python because its easy and quick to do so.

2) its slow as.

3) abandon software and write it in something else, or, live on with slow ass software and blame python for being slow and rubbish forever more.

re-writing python in c is a hideously painful process, and its proven to be very unsuccessful practically.

Writing new code in c/c++/whatever and exposing a python api is where successful projects like numpy and tensorflow live.

python is very good at what it is, but no one is ever going to go and rewrite your python code in c to make it faster; its just going to be slow forever.

New comment by shadowmint in "Asynchronous Programming in Rust"

shadowmint — Sun, 21 Apr 2019 06:08:22 +0000

Sure, but it sucks to be left with code you have to rewrite because you're donating your time to the cause to find problems and smooth the path for other people in the future.

Maybe some people are in to that for fun, but the for the majority of people, the message should be:

stick with stable folks.

New comment by shadowmint in "Ask HN: How do I improve our data infrastructure?"

shadowmint — Sat, 20 Apr 2019 13:34:08 +0000

> sql to write etl in will drastically reduce the amount of work needed to write etl.

My experience with writing an ETL in SQL is that it is almost never, quick, easy, correct or easy to test, and also almost always denormalized, or unconstrained (dimensonal keys which aren't 'real' foreign keys, just numbers so you can parallelize the data inserts and updates without constraint errors).

So... your milage may vary with that.

It's most certainly not true that writing any kind of ETL that uses SQL saves time in all cases.

New comment by shadowmint in "Ask HN: How do I improve our data infrastructure?"

shadowmint — Sat, 20 Apr 2019 13:21:46 +0000

It sounds like you already have an idea of what you want to do, but I think you should pause and think more deeply about what you have, vs. what you want.

What I would want in your situation is:

    - All the data in one place.
    - An easy way to explore the data. 
    - A single source of truth for transformed data.
    - Metadata to explain the data model (ie. documentation).

What you're proposing does some of those things, but it also:

    - Adds yet another maintain-forever technology to your stack.
    - Adds yet another pipeline (or set of pipelines) that does the same thing.
    - Moves from an architecture that is clustered for scale (ie. spark) to one that only scales vertically (postgres). 
    - Potentially introduces *yet more* sources of truth for some data.

> I was thinking that in a first iteration, data scientists would explore their denormalized, aggregated data and create their own feature with code.

^ Moving data into postgres doesn't make this somehow trivial, it just enables people to use a different SQL dialect. The spark API is, for anyone competent to be writing code, not meaningfully less complicated than using the postgres API.

I appreciate the naive attractiveness of having a traditional "data warehouse" in a SQL database, but there is actually a reason why people are moving away from that model:

    - it doesn't scale
    - SQL is terrible language to write transformations in (its a *query* language, not an ETL pipeline)
    - it's only vaguely better when you have many denormalised tables, vs. s3 parquet blobs
    - you have to invent data for schema changes (ie. new table schema, old data in the table) (ie. migrations are hard)

More tangibly, I know people who have done exactly what you're talking about, and regretted it. Unless you can very clearly demonstrate that what you're making is meaningfully better, it won't be adopted by the other team members and you'll have to either live forever in your silo, or eventually abandon it and go back to the old system. :/

So... I don't recommend it.

The points you're making are all valid, and for a small scale like this, if you were doing it from scratch it would be a pretty compelling option... but migrating entirely will be prohibitively expensive, and migrating partially will be a disaster.

Could you perhaps find better way to orchestrate your spark tasks, eg. with airflow or ADF or AWS Glue or whatever?

Personally I think that databricks offers a very attractive way to allow data exploration without a significant architecture change.

The architecture you're using isn't fundamentally bad, it just needs strong across the board data management... but that's something very difficult to drive from the bottom up.

New comment by shadowmint in "Please be more careful when interpreting the Stack Overflow Developer Survey"

shadowmint — Sat, 13 Apr 2019 15:17:48 +0000

That seems fair; but they have a whole methodology section.

If you want to argue with it, surely the onus is on you to do it concretely?

> Because of your methodology, we must assume a biased sample.

^ I find this quote problematic.

Why must we assume that? If you want to distribution comparisons and point out there survey results are skewed by X compared to some other survey Y... ok.

...but that’s not whats happening right? Its just a flat out arbitrary assumption.

I don’t like arbitrary assumptions when I’m doing maths.

Its easy to say something is wrong, but if you can’t quanitfy how its wrong, I’m struggling to see why I should accept the assumption being raised here.

The js survey was very similar; it was arbitrarily asserted it went to more react developers... but no one actually proved that. They just... assumed it.

New comment by shadowmint in "Please be more careful when interpreting the Stack Overflow Developer Survey"

shadowmint — Sat, 13 Apr 2019 14:51:16 +0000

> you cannot generalize from a non-random sample

So, honest question:

If any survey of any size can be ignored on the basis that the sample is not random, then how is any survey meaningful?

Isn’t this a self defeating argue?

You can’t prove the sample is random, all you can do is show differences between samples and suggest its not consistent... but how do we go away and prove that some other survey we’re comparing it to is from a random sample?

ie. Isnt this just a convenient excuse to deny that a survey is meaningful?

Statistically, how do you mathemtaically quantify the effect of selection bias?

...because, it seems to me, unless you can actually do that, you’re just doing some arm chairmhand waving because you don’t like the results youre seeing.

This has come up several times (eg. js survey about react vs angular), and no one has ever given me a meaningful and mathematical response.

Its always just.. “it must be sample bias”, regardless of the 90000 people they surveyed.

I don’t accept you can survey 90000 developers and cannot offer any generalisation from those results without quanatitively proving there is an overwhelming sample bias, and specifically quantifying the degree of that bias.

Am I missing something here? Everyone seems thoughorly convinced that this is perfectly normal.

(I’m not proud, I’ll take your down votes, but please answer and explain what I’m missing)

New comment by shadowmint in "On Learning Rust and Go: Migrating Away from Python"

shadowmint — Sun, 24 Mar 2019 12:47:31 +0000

Oh please, go read the source code for tensorflow and then come back and we can have a real conversation.

New comment by shadowmint in "On Learning Rust and Go: Migrating Away from Python"

shadowmint — Sun, 24 Mar 2019 12:41:50 +0000

no, the trivial frontend is built in python; the real code is usually c++ or c.

Ignore that reality if you want to, but it is a fact.

Big complicated python projects are seldom pure python, they are usually a friendly python frontend to a serious application written in something else.

It seems in no way remarkable that someone wanting to build a serious backend type piece of functionality would pick another language that was, just for example, multithreaded.

New comment by shadowmint in "Data Science Teams Need Generalists, Not Specialists"

shadowmint — Fri, 22 Mar 2019 14:35:03 +0000

> “one person sources the data, another models it, a third implements it, a fourth measures it”

Call it whatever you like, it is what it is.

New comment by shadowmint in "Python Developer Survey 2018 Results"

shadowmint — Thu, 07 Feb 2019 15:11:00 +0000

oh come on. 18000 responses.

This is the js survey results all over again: no. Unless you can statistically prove the results are biased, you don’t get to ignore the results because you dont like them.

Finding data points with no methodology that contract the survey result does not invalidate the survey results.

Thats. not. how statistics work.

A great deal of effort was put into this survey, and the stats you’re looking at are more likely biased than the ones in this survey.

The stats and the methodology here are clearly documented; if you want to argue with them, be specific and provide concrete statistical proof for your assertions.

Specifically, why do the stats you have prove anything, and what confidence do you have that they are representative?

New comment by shadowmint in "Asynchronous Programming in Rust book"

shadowmint — Tue, 22 Jan 2019 13:21:30 +0000

I know, but I still struggle to get a handle on the state of it really.

What is going on with futures 0.3? Why is everyone still using 0.1?

How does that relate to these issues?

It superficially appears like the whole async story is still in a concept stage...

New comment by shadowmint in "Asynchronous Programming in Rust book"

shadowmint — Tue, 22 Jan 2019 13:09:54 +0000

Is it just me, or is the fact that the most important part:

https://rust-lang.github.io/async-book/getting_started/state...

Is missing, somewhat ironic?

Feels very much like the state of async matches the state of the guide. :P

What is the state of async? Is it close? Is it still changing with the futures 0.3-beta not finalized?

Are we six months away? A year?

New comment by shadowmint in "Ask HN: Go-to web stack today?"

shadowmint — Sat, 05 Jan 2019 15:19:55 +0000

> but I am not going to elaborate on this one because lots of people already pointed it out...

If there’s any meaningful reason use JWT, it would probably be helpful to articulate it for people.

(I would myself, but I consider JWT to be actively harmful to scaling and security in most implementations (specifically global server side refresh token stores which act as a single point of failure), poorly understood and generally speaking inferior to cookies in almost every respect... but necessary, in some, limited circumstances... but if you have any actual, non hand wavey reason why they’re useful for a general, single domain site, I’d be interested to hear why)

New comment by shadowmint in "Ask HN: Go-to web stack today?"

shadowmint — Sat, 05 Jan 2019 15:12:04 +0000

This.

JWT is a pain in the ass for a lot of reason people don’t appear to understand until they actually try to use it; and the majority of the proponents for it appear to have never actually used it seriously and had to deal with issues like, oh wow, redis is now the bottleneck for my ‘stateless’ authentication.

Unless you need it and can articulate why, with no magic hand waving... just. use. cookies.

...and ffs, dont just put your jwt in a cookie, thats stupid...and if you don’t understand why, you shouldn’t be using jwt.

New comment by shadowmint in "Clojure is Capable"

shadowmint — Fri, 28 Dec 2018 14:06:24 +0000

Obviously, but that's not the point I was making.

...but how does that (what you wrote) == "think of programming in Clojure like playing a difficult video game based on trial and error with emulator save states"?

New comment by shadowmint in "Clojure is Capable"

shadowmint — Fri, 28 Dec 2018 06:47:57 +0000

> Interactive REPL based workflow (think of programming in Clojure like playing a difficult video game based on trial and error with emulator save states vs without).

What?

I'm not sure I understand what that's even trying to say?

Is it trying to say, think of programming as playing a difficult computer game, and using a REPL like having save states?

...because it sounds like you are saying programming in Clojure is like playing a difficult video game based on trial and error, which is a really rubbish endorsement.

> To touch on the video game analogy - imagine playing one of the old Mega Man games, Dark Souls, or Battletoads. Now, imagine that instead of getting feedback (a death) that results in a hard restart at the beginning, you instead just get taken back one action / event that caused an error (a single failed function call in REPL). That's like loading a save state, and if you like easy mode, you'll love the Clojure REPL.

mmm... but if your analogy is 'this is like doing something really hard, but you can timestep back to save state so it's not so bad', then you're still basically saying writing Clojure is really hard.

I don't think that's really a fair thing to say about Clojure, it's just a terrible analogy.

Expressing things in Clojure is what it excels at; if you want an anology the REPL is more like a minecraft sandbox that you just continually keep building in, rather, than planning out your structure on paper before you start playing.

New comment by shadowmint in "Microsoft is Dead (2007)"

shadowmint — Sun, 02 Dec 2018 03:08:46 +0000

It's easy to laugh now, but in 2007 this was spot on.

It was 10 years before Microsoft managed to drag itself up and do something useful with itself.

It's really remarkable that Microsoft managed such a big turn around after such epic failures (like mobile); others (like IBM...) are still struggling to do it.

Just goes to show, competitive pressure is a good thing. :)