Hacker News: digitalzombie

New comment by digitalzombie in "The Data Science of MathOverflow"

digitalzombie — Tue, 12 Feb 2019 10:30:08 +0000

> Statistics is "the practice or science of collecting and analyzing numerical data in large quantities"

That's not true.

Statistic can analyze data in small quantities you can read up on it with nonparametric statistic. At least half of nonparametric statistic deal with small data (mostly through use of ranking). With Bayesian stat you can just assign a prior distribution.

I think the best way to describe statistic is that it uses data to infer the population. A subset data via sampling and infer a statistic (mean, median, whatever) about the population at large.

More often than not I'm just surprise as to why Data Science is so big when seemingly it seems like statistic does every freaking thing with data. From how to correctly sampling, designing experiment to collect the correct data to answer a hypothesis, making sure the data aren't bias, etc.. It deals with designing the data from inception to end, including either collecting the data or data given to you already. On top of dealing with problematic data such as missing data, imputation, etc..

New comment by digitalzombie in "Google Edge TPU Devices"

digitalzombie — Mon, 11 Feb 2019 11:06:56 +0000

Seriously Intel is already struggling after buying Nervana.

I went to their shing dig and they were working their butt off to wow the developers who were invited. When I asked for hard number they were very mum about that and very evasive.

The timeline for Nervana chip have been always seemingly in this mystical horizon that is never solidified to a real date but over yonder.

Google is going to pull this crap? They got better software expertise than Intel though they may be able to do it. But after that fiasco with Angular 1 to 2 I wouldn't trust Google with any early version number.

New comment by digitalzombie in "Apple Lays Off 200 Employees from Autonomous Car Unit"

digitalzombie — Thu, 24 Jan 2019 09:20:03 +0000

Their power cord keep on dying for my macbook 2013.

I had a temp iphone and it took over a lot of my texts by redirecting it toward apple's server first. When I got my android back I receive no text msgs from my friends who have iphone.

It also seem like they're making cheaper stuff so they can increase their margin via tech support and peripherals.

I agree they should invest more in software so they don't have to resort to these crappy tactic.

Also they should make a cheaper iphone region lock it to china and india only if they want to increase their margin. Their current strategy for these markets is stupid.

New comment by digitalzombie in "Apple Lays Off 200 Employees from Autonomous Car Unit"

digitalzombie — Thu, 24 Jan 2019 09:11:13 +0000

You can harvest data and anonymize it. My friend at Google supposedly is doing this via research at Google. I'm not going to say Google care about our privacy as much as Apple but they are researching this probably for medical and for when opportunity arises. To be perfectly honest I think Apple's stance on privacy is all marketing, there are cases where they refused to unlock phones for the FBI but there are other cases where they ship recordings of your talk to SIRI to a third party company.

New comment by digitalzombie in "Oracle allegedly underpaid $400M in wages to underrepresented employees"

digitalzombie — Thu, 24 Jan 2019 02:21:15 +0000

Programmers should have a union.

Especially the ones that are working in the gaming industry.

New comment by digitalzombie in "Alexandria Ocasio-Cortez Is Absolutely Right About Racist Algorithms"

digitalzombie — Wed, 23 Jan 2019 21:50:16 +0000

It's just statistic.

How you collect your data can introduce bias and statistic have a whole range of topics on how to collect data without introducing bias and systematic techniques to sample data. Stratifying data if a certain group is under represented, random sample, etc... Survey analysis goes into hardcore details on how to sample a population to accurately do inference and it's an interesting statistic sub field if anybody is interested.

ML tends to be more here's the data already do something with it.

Statistic encompass everything about the data including how to sample, collect the data, and designing the experiment to collect the correct data to answer your hypothesis. Where as ML is usually here's the data, go figure out what you can get out of it.

New comment by digitalzombie in "Front End Development Topics to Learn in 2019"

digitalzombie — Sun, 20 Jan 2019 05:20:22 +0000

> anyone have a good strategy?

I left the industry for data science. Went back to school and fall in love with statistic. Going to try to get a biostat job. I'm trying to do side projects web app and not give a damn about the flavor flav of the month.

I don't know if that's a good strategy.. it's basically peacing out. I also move most of my tech stack to plain old stuff like a simple mvc framework and no frontend rendering.

New comment by digitalzombie in "More Data Is Not Better and Machine Learning Is a Grind"

digitalzombie — Wed, 16 Jan 2019 07:30:31 +0000

More data is better.

You can reduce it via PCA one of the many techniques in multivariate statistic.

You can do anova to select your predictors.

In general you can use a subset of it using the tools that statistic have provided.

Complaining about messy data... welcome to the real world. As for complaining about non-reproducible models , choose a reproducible ones. I've only done mostly statistical models and forest base algorithms and they're all reproducible.

All I see in this post is complaints and no real solutions. The solution that's given is what? Have less data?

> The results were consistent with asymptotic theory (central limit theorem) that predicts that more data has diminishing returns

CLT talks about sampling from the population infinitely. It doesn't say anything about diminishing returns. I don't get how you go from sampling infinitely to diminishing returns.

New comment by digitalzombie in "Elixir v1.8 released"

digitalzombie — Tue, 15 Jan 2019 05:50:34 +0000

Elixir have better syntax because it's familiar.

I did play around with Erlang a few time. I really love the comma and period. It really nest and group the function patterns together.

For Elixir you just have to put them near each other. The function are enclose in a do/end block. And you put those similar pattern function next to each other. I believe you can move those pattern function around.

Other than that I can't recall much else. Prolog syntax is foreign but I'll admit I can see myself learning Erlang pretty fast since the language isn't "big" compare to Java or Scala. Like wise Elixir is a small language too with added macro (I don't recall Erlang got this) feature. Elixir also have better tooling baked in (mix/hex, doc, lint, etc...).

Also the Erlang community sucks. This is my personal experience but I vividly recall attending a meetup. The group meetup have a discussion about Erlang's adoption and how to get better adoption. I told them Ruby got popular because RoR other wise people would have chose Python and call it a day. Erlang should really have a killer framework like a web framework. Everybody thought it was ridiculous.

And you know what? Elixir came along and so did Phoenix. Phoenix may not be the only thing that got Elixir popular, Elixir got toolings from the get go. Jose and those people are from the industry they know what was needed. They came from Ruby.

New comment by digitalzombie in "Will Headless Intel Woo AMD’s Lisa Su?"

digitalzombie — Mon, 14 Jan 2019 19:18:03 +0000

No please.

I do not want a monopoly in x86.

Intel is very greedy in their cpu prices. Their consumer cpu never gave the option of ecc memory and have gimp certain things. AMD was more flexible. Having two company competing will be great for the consumers.

New comment by digitalzombie in "Ask HN: What was the Internet like before corporations got their hands on it?"

digitalzombie — Sun, 13 Jan 2019 17:38:26 +0000

Uh... pretty close knit. There were no search engines. There were only directories with dedication toward search hobby/interesting.

I was pretty big into anime fanfic so anipike was the web directory I went to. The fan sites via xoom, geocities, tripod, angelfire are linked via web ring. Web ring is like a circle of website with similar interest and individual anime website would have a web ring to help find other similar website within the web ring. https://en.wikipedia.org/wiki/Webring

The ads weren't bad it was banners and it blinks or slide across like marquee html tag. The majority of the web ads felt like print paper ads less scientific like today. Today ads they got metric, funnel, and trying to get more eyeballs on it, very gamey to get your attention which I believe leads to addiction and click bait stuff.

The user involvement was less, social aspect seems to be around hobbies. Now a day you can try to catch up with your friends via facebook, snapchat, etc.. it seems like everybody is trying to get approval by online friends by having that awesome picture while traveling and doing stuff.

There was a famous newsletter I follow for my fanfiction too (rec.arts.anime.creative).

New comment by digitalzombie in "“The Book of Why” by Pearl and Mackenzie"

digitalzombie — Fri, 11 Jan 2019 19:32:25 +0000

I tried reading Pearl once. I couldn't get over his tone.

Andrew Gelman summarize it pretty nicely his take on it.

Coming from a statistic background, casual inference is a growing thing now and several government sponsor research have been pushing for it.

Casual inference from statistic point of view is base on missing data, basically Rudin stuff. It's pretty dang interesting to me. I'm sure there are many ways of looking at the same thing. Linear regression you can look at it in more of a optimal math problem with cost function or you can look at it in statitic using maximum likelihood estimation. Both have it's pro and con, with MLE you get a confidence interval. In my bias opinion I feel that statistic is only about data and it's a great domain for casual inference.

There's no need to put a field down to make yours better. But if it's constructive criticism (pro/con, contrast) I think it make both fields better. Pearl attitude is off putting when you try to read his stuff. We're all human and have vary degree of ego, if you're going to try to convince us that do calculus and your ways is better be objective about it or word things better. If you don't want to convince people then just be blunt as hell.

New comment by digitalzombie in "The richest families in Florence in 1427 are still the richest (2016)"

digitalzombie — Thu, 10 Jan 2019 16:07:54 +0000

> averages Hide too much.

I had to read the article because of your comment.

You're wrong.

When competent researchers talk about salary/income they always do median because average doesn't account of skew like Bill Gates.

If you do a ctrl+f with mean or average you won't find it. But if you do ctrl+f and search for median you'll find it.

New comment by digitalzombie in "IBM releases Elm-powered app"

digitalzombie — Thu, 10 Jan 2019 05:55:25 +0000

Considering they sold most of their hardware department, such as harddrive to Hitachi and thinkpad to Lenovo, and went focusing on patents, this is a change in their image. I had the view they were risk adverse, to be using Elm is interesting.

New comment by digitalzombie in "WeWork Gets a Visit from Financial Reality"

digitalzombie — Tue, 08 Jan 2019 20:16:47 +0000

WeHarvest what is their product? Soylent green?

New comment by digitalzombie in "Would you still pick Elixir in 2019?"

digitalzombie — Mon, 07 Jan 2019 01:30:35 +0000

I dislike coding in javascript for large project. The language was originally for small stuff.

NodeJS bought it to backend and the language itself weren't meant for it. Since then ECMA5 and stuff tried to fix these shortcomings. But you can't expect me to love javascript's weakly type versus elixir or python's strong type (strong not static, as in it doesn't implicitly type convert stuff like javascript). It's a nightmare and concurrency model in NodeJS in my opinion is subpar compare to Elixir's.

New comment by digitalzombie in "Spain will soon overtake Japan in life expectancy rankings"

digitalzombie — Sun, 30 Dec 2018 12:07:10 +0000

I don't know if you're joking or not but:

Medicaid only help the poorest. You need to be at a certain income to get it. I am a graduate student that is currently on it.

Medicare-for-all helps everybody regardless of income. Everybody is entitle to it regardless if you're poor.

New comment by digitalzombie in "Ask HN: Tool for self-hosting your own Facebook profile after downloading it?"

digitalzombie — Fri, 28 Dec 2018 14:02:07 +0000

Mastodon is just twitter. I'm on it... unless I'm missing something.

It doesn't completely replace it at all and missing a lot of functionalities that FB have. The ones being, chat room, event invitation/planing, birthday reminder, photo albums, group, etc...

New comment by digitalzombie in "What Kagglers Are Using for Text Classification"

digitalzombie — Thu, 27 Dec 2018 23:52:40 +0000

> Nobody cares how long it takes to train a model.

That's a reckless generalization. I care.

My thesis would take forever if I didn't do any optimization. Also my data is 20 rows with ~6000 predictors.

There are models out there that can take months! I worked on one that took months. We had to tweak it and optimize it to see if we can get it to acceptable training time.

New comment by digitalzombie in "Machine learning has become alchemy (2017) [video]"

digitalzombie — Wed, 26 Dec 2018 06:42:54 +0000

Most machine learning algorithms that aren't statistical base doesn't give a CI. From a statistical stand point it doesn't give a sense of how good your prediction is. You can get a general sense with just CV.

Also your parameter is not inferable like in statistical algorithm. This is where I see people saying Deep Learning isn't interprable and there are research into this area. If you compare time series stat forecast algorithm with deep learning you at least get a CI on stat algorithm.

Randomly dropping node is pretty magic in my mind.

While I don't know much about SVM I know it's mathematically proven so there should be a way to interpret SVM fitted model.

I sure as hell wouldn't use ML in clinical trial for drugs. That's why biostat is a thing.