Hacker News: tfederman

New comment by tfederman in "Ask HN: Is archive.org a good place for structured data?"

tfederman — Fri, 10 Oct 2025 15:23:20 +0000

It's not a big data set that lends itself primarily to analysis, it's more like content. For example, a list of all US Presidents with a lot of metadata or text content fields about them collected/combined from different sources, cleaned, corrected, annotated, etc. (Pretend Wikipedia has only a subset of these fields and considers broadening them out of scope.)

As for Github, the data would still be under "my" account and I'm thinking about more of a platform that doesn't depend on one person. Maybe I would manage day to day version control in Github but I'd want to promote occasional releases to be more official and not reliant on my account.

Ask HN: Is archive.org a good place for structured data?

tfederman — Fri, 10 Oct 2025 15:08:01 +0000

Let's say I put a lot of effort into creating an authoritative data set on a notable topic in a format like CSV, JSON, whatever. And I want to share it and have it housed somewhere more official/permanent than I'll be able to support.

Could archive.org be thought of as a place for raw assembled/authoritative data, with a layer that could transform it into a static web site view to be hosted elsewhere with additional design and features but would not be the authoritative source of the underlying data?

Is there a better place for this sort of vision?

Comments URL: https://news.ycombinator.com/item?id=45539917

Points: 3

# Comments: 2

New comment by tfederman in "Newspapers Are Recommending AI-Hallucinated Novels"

tfederman — Tue, 20 May 2025 18:32:44 +0000

In the GPT-2 era I created CouldReads, a big data set of generated book titles/synopses trained on thousands of e-books. It was a fun project in the naivete of 2020 but it's less amusing now.

Ask HN: Any recommendations for help with (engineering) career presentation?

tfederman — Mon, 19 May 2025 20:14:13 +0000

I'm in a long career break to work on my own stuff because my financial situation allows it, but I've found one job I'd like to apply for. My LinkedIn profile is extremely neglected and I've never put any effort into that sort of presentation, self promotion, or online presence in general.

I'd like to fix that, I want to pay for someone to fix (or help fix) up my profile, touch up my resume, and help write a bio and generic cover letter about my specific strengths and situation.

There's a million people and services that do this but are there any that are focused on software engineering and really solid on understanding that specific market?

Any recommendations? I'm not looking for a low effort/AI solution.

(email in bio)

Comments URL: https://news.ycombinator.com/item?id=44034310

Points: 2

# Comments: 1

New comment by tfederman in "Crawlers impact the operations of the Wikimedia projects"

tfederman — Fri, 02 May 2025 14:22:06 +0000

A while back I wrote up a way to turn the big Wikipedia XML dump into a database. Not a generic table with articles but thousands of tables, one for each article "type". I'm not sure if this is still the best way to go about it.

https://feder001.com/exploring-wikipedia-as-a-database-part-...

New comment by tfederman in "Show HN: Fountain of RSS – 30k active feeds and code for sourcing more"

tfederman — Thu, 17 Apr 2025 15:30:29 +0000

That looks like a crowdsourced project for turning arbitrary sites into RSS which is very cool, but I don't see a way to get a large RSS data set out of it. And with about 5000 sources (I think) it's not as large as what I was hoping for, but it could be a good complementary source.

Show HN: Fountain of RSS – 30k active feeds and code for sourcing more

tfederman — Wed, 16 Apr 2025 16:30:23 +0000

Recently I was searching for a very large collection of RSS feeds and didn't have luck finding an existing one.

I decided to see if they could be sourced reliably at any volume from the meta tags of a stream of links, which are easy to find. Long story short, that did work very well. I used the Bluesky firehose as the stream of links.

The repo has a script that narrows the firehose down to just a "fountain" of RSS feeds and their metadata. There's also a .tsv with about 30,000 entries I got from running the script on and off over a few days.

https://github.com/tfederman/fountain-of-rss/

p.s. What's good in the RSS tool/reader scene in 2025?

Comments URL: https://news.ycombinator.com/item?id=43707363

Points: 2

# Comments: 3

New comment by tfederman in "Ask HN: What are you working on? (March 2025)"

tfederman — Sun, 30 Mar 2025 22:43:41 +0000

RSS reader through Bluesky custom feeds: https://github.com/tfederman/stroma-news

Bluesky API library spun off from the other project: https://github.com/tfederman/pysky

Haven't really started it yet, but a master list of RSS feeds and the code I used to source them: https://github.com/tfederman/huge-rss-list

And also a new project to fetch all links seen in the Bluesky firehose and gather metadata to build a database of sites and pages at a more granular level than the domain. For example, is account X posting video links from one YT channel or many?

New comment by tfederman in "Show HN: AI-Less Hacker News"

tfederman — Wed, 05 Apr 2023 20:10:37 +0000

Just for fun I wanted to do a simple server-side version of this where the submissions would be truly hidden on my account, so it would take effect on mobile too. And avoid client side artifacts like messed up numbering.

https://github.com/tfederman/hacker-news-topic-hider

New comment by tfederman in "From hell to HTML: releasing a Python package to easily work with Wikimedia HTM"

tfederman — Wed, 05 Apr 2023 13:07:56 +0000

If anyone's interested in an approach to processing the data set quickly, I got something working and wrote it up when I was curious about turning the content into structured data for database tables.

https://feder001.com/exploring-wikipedia-as-a-database-part-...

New comment by tfederman in "“Just remove the duck” (2013)"

tfederman — Tue, 03 Mar 2015 19:31:55 +0000

That's maybe true, and also a good reason I wouldn't work for a large company again.

New comment by tfederman in "“Just remove the duck” (2013)"

tfederman — Tue, 03 Mar 2015 17:45:42 +0000

My company just doesn't have product or project managers and it's wonderful. Fewer meetings, no intermediaries, more agility. It could only work in practice, it could never work in theory.

New comment by tfederman in "Simplicity and Utility, or Why SOAP Lost"

tfederman — Tue, 09 Dec 2014 17:31:22 +0000

I feel like I spend a troubling amount of time as a software developer dealing with and/or avoiding solutions that are much more complicated than the problems they're trying to solve.

New comment by tfederman in "Why Clay Shirky just banned technology use in class"

tfederman — Sat, 27 Sep 2014 20:37:30 +0000

When I take notes because I need to retain something, the writing part is more valuable than reviewing it later. And writing on paper is important, typing isn't effective for that part.

When I take notes just to record details that I can look up later on demand it's faster to type and there's no downside. But that matches work more than it does school.

New comment by tfederman in "Amazon Takes a Big Step Toward Making Its Own Deliveries"

tfederman — Fri, 26 Sep 2014 19:53:10 +0000

At this point I already get my stuff from Amazon fast enough, and I don't even have a prime account. I'd rather see Amazon improve working conditions for its employees (if what that I've read about that is true) than chase this obsession with feeding my desire to get more material things as fast as possible.

Ironically their focus on customer satisfaction has started to make me feel dirty about buying from Amazon.

New comment by tfederman in "Why We Keep Playing the Lottery"

tfederman — Thu, 25 Sep 2014 17:03:34 +0000

When someone gets entertainment value from playing the lottery, is it really about the chance of winning multiplied by the amount of happiness the money would bring? Or does it have more to do with the excitement that comes with each drawing? (Similar to how I can enjoy watching a football season even if my team doesn't win the Super Bowl.)

I think it's more likely the latter, though I don't personally see the entertainment value of playing the lottery. I wouldn't lose respect for someone for having a hobby I don't share or understand, though.

New comment by tfederman in "Why We Keep Playing the Lottery"

tfederman — Thu, 25 Sep 2014 14:03:44 +0000

When the expected value is so definitively against you I guess it's implied that you're playing for variance. Maybe you also feel that you're playing for the entertainment value, which isn't subject to the same mathematical rationality.

I get that. When I play poker for 5 hours and come out even at the end of the night or slightly behind, I consider it worthwhile because I had fun for 5 hours.

New comment by tfederman in "Loyalty Nearly Killed My Beehive"

tfederman — Mon, 22 Sep 2014 15:19:53 +0000

When I saw the title my first instinct was, okay, what's the metaphor between a beehive and a development team or a startup? You could find some parallels, but I agree this article is better.

New comment by tfederman in "Kicked Off Facebook, and Wondering Why"

tfederman — Fri, 19 Sep 2014 20:35:09 +0000

Maybe it's hard to find Facebook essential (as opposed to convenient) if you lived during a time when you had to actually know your friends' phone numbers and enter them manually. As I did.

Anyone you're close to you have a way to contact without Facebook. Anyone else shouldn't make you feel like you're missing anything important. Right?

New comment by tfederman in "Mystery Science Theater 3000"

tfederman — Thu, 18 Sep 2014 00:01:14 +0000

There's a good selection and it's weighted towards earlier seasons which is good but unfortunately no Manos or Santa Claus Conquers the Martians yet.