Hacker News: bnewbold

New comment by bnewbold in "Maine is about to become the first state to ban major new data centers"

bnewbold — Thu, 09 Apr 2026 21:51:47 +0000

Where are you getting the 200 megawatt number from?

The document you linked says that a large auto assembly plant consumes around 188,000 MWh annually (with regional variation). By my quick math that is less than 22 megawatts baseline load (24/7/365).

There is a mention that natural gas and other fuels being used on-site, are you converting those to MWh equivalent? I'm not as familiar with that conversion, but from a quick online calculator I found it would still be under 75 megawatt for electrical and fuel-equivalent combined.

New comment by bnewbold in "I was right about ATProto key management"

bnewbold — Mon, 26 Jan 2026 01:09:38 +0000

Thanks for the response Nora.

Because of your blog post I went through the process of setting up a did:web account myself this afternoon, and it was painful. Eg, I found a bug in our Go SDK causing that "deactivated" error (https://github.com/bluesky-social/indigo/pull/1281). I kept notes and will try to get out a blog post and update to 'goat' soon.

We've also been making progress on the architecture and governance of the PLC system. I don't know if those will assuage all concerns with that system immediately, but I do think they are meaningful steps in reducing operational dependency on Bluesky PBC.

New comment by bnewbold in "I was right about ATProto key management"

bnewbold — Sun, 25 Jan 2026 22:37:45 +0000

fair enough, the did:web flows are not documented even for technical atproto developers, and there needs to be a self-serve way to heal identity/account problems elsewhere in the network (the "burn" problem).

I do think that did:plc provides more pragmatic freedom and control than did:web for most folks, though the calculus might be different for institutions or individuals with a long-term commitment to running their own network services. But did:web should be a functional alternative on principle.

I'm glad that the PDS was easy to get up and running, and that the author was able to find a supportive community on discord.

New comment by bnewbold in "Where it's at://"

bnewbold — Sat, 04 Oct 2025 01:27:21 +0000

(I added a reply up-thread here: https://news.ycombinator.com/item?id=45469667)

New comment by bnewbold in "Where it's at://"

bnewbold — Sat, 04 Oct 2025 01:26:25 +0000

DNS poisoning is a concern in some situations, but not always.

The common case with at:// strings is to put in the DID in the "authority" slot, not the handle. In that case, whether resolving a did:plc or did:web, there are regular HTTPS hostnames involved, and web PKI. So the DNS poisoning attacks are the same as most web browsing.

If you start from a handle, you do use DNS. If the resolution is HTTPS well-known, then web PKI is there, but TXT records are supported and relatively common.

What we currently recommend is for folks to resolve handles server-side, not on end devices. Or if they do need to resolve locally, use DoH to a trusted provider. This isn't perfect (server hosting environments could still be vulnerable to poisoning), but cuts the attack surface down a lot.

DNSSEC is the current solution to this problem. But we feel like mandating it would be a big friction. We have also had pretty high-profile incidents in our production network caused by DNSSEC problems by third parties. For example, many active US Senators use NAME.senate.gov as a form of identity verification. The senate.gov DNS server uses DNSSEC, and this mostly all worked fine, until their DNSSEC configuration broke, which resulted in dozens of senators showing up in the Bluesky app as "invalid handle". This was a significant enough loss of trust in the DNSSEC ecosystem that we haven't pushed on it since. I think if we saw another application-layer protocol require it, and get successful adoption, we'd definite reconsider.

New comment by bnewbold in "Bluesky: Updated Terms and Policies"

bnewbold — Thu, 14 Aug 2025 22:41:33 +0000

"A Full-Network Relay for $34/month" https://whtwnd.com/bnewbold.net/3lo7a2a4qxg2l

New comment by bnewbold in "Bluesky: Updated Terms and Policies"

bnewbold — Thu, 14 Aug 2025 20:04:03 +0000

This one has been popular recently: https://bsky.app/profile/spacecowboy17.bsky.social/feed/for-...

(there are several feeds named "For You"; IIUC this started a couple weeks ago and is based on "likes by people you follow"

New comment by bnewbold in "Bluesky: Updated Terms and Policies"

bnewbold — Thu, 14 Aug 2025 20:01:14 +0000

Yes, all of those social graph relationships are hinged off a permanent identifier (DID) and everything comes along when accounts migrate PDS instances. Folks can use zeppelin.social from any PDS instance. The DID PLC directory is currently hosted by Bluesky, but the directory can be forked, and did:web identifiers can be used as an alternative (and several independence-minded folks in the network do so).

Migration between servers is so seamless that is causes confusion and doubt that the protocol even supports migration, because there is basically zero in-app visibility of which users are on which server.

Yes, the network continues to work on zeppelin.social if Bluesky servers are down.

New comment by bnewbold in "Bluesky: Updated Terms and Policies"

bnewbold — Thu, 14 Aug 2025 19:56:57 +0000

If you want a service which indexes every post in the public network, including from folks you don't follow, that is just going to require resources. I think $200/month for a full-network index (as zepplin does) is very reasonable and approachable for organized groups without external funding. Many Mastodon instances cost more than that, and provide a must smaller scope of indexing.

If you want a small scaled down setup for just a small community, which still interoperates with the full network but doesn't have a complete network, there are setups like AppViewLite, which can run on, eg, an old laptop at home: https://github.com/alnkesq/AppViewLite

Personally, I don't think individualist self-hosting is a necessary or helpful goal for indexing the network. Most humans are not interested in spending the time or learning the skills to do this, even if it was as easy as setting up a self-hosted blog with RSS. I think small collectives (orgs, coops, communities, neighborhoods, companies, etc) exist and can fill this role.

Regardless, this is moving the discussion, which was about whether it was possible to decentralize each component the network, not whether it was pragmatic for individuals to self-host the whole thing.

New comment by bnewbold in "Bluesky: Updated Terms and Policies"

bnewbold — Thu, 14 Aug 2025 17:57:06 +0000

hundreds (thousands?) of users have signed up for Bluesky Social, then moved their accounts to independent hosts. folks can use https://zeppelin.social/ as a totally free-standing bluesky posting experience that interoperates with the full network.

Bluesky Social still clearly dominates the ecosystem, but there is no single component of the system that does not have a open/alternative option for exit.

Do you disagree? Is there a specific centralized component you take issue with?

New comment by bnewbold in "Data centers contain 90% crap data"

bnewbold — Mon, 07 Apr 2025 03:45:39 +0000

I agree with the general sentiment here, but don't like the examples. 200 photos per person per year isn't very much! That is all fine.

What really bloats things out is surveillance (video and online behavioral) and logging/tracking/tracing data. Some of this ends up cold, but a lot of it is also warm, for analytics. It bloats CPU/RAM/network, which is pretty resource intensive.

The cost is justified because the margins of big tech companies are so wildly large. I'd argue those profits are mostly because of network effects and rentier behavior, not the actual value in the data being stored. If there was more competition pressure, these systems could be orders of magnitude more efficient without any significant different in value/quality/outcome, or really even productivity.

New comment by bnewbold in "SeaweedFS fast distributed storage system for blobs, objects, files and datalake"

bnewbold — Fri, 02 Feb 2024 23:39:49 +0000

SeaweedFS does the thing: I've used it to store billions of medium-sized XML documents, image thumbnails, PDF files, etc. It fills the gap between "databases" (broadly defined; maybe you can do few-tens-KByte docs but stretching things) and "filesystems" (hard/inefficient in reality to push beyond tens/hundreds of millions of objects; yes I know it is possible with tuning, etc, but SeaweedFS is better-suited).

The docs and operational tooling feel a bit janky at first, but they get the job done, and the whole project is surprisingly feature-rich. I've dealt with basic power-outages, hardware-caused data corruption (cheap old SSDs), etc, and it was possible to recover.

In some ways I feel like the surprising thing is that there is such a gap in open source S3 API blob stores. Minio is very simple and great, but is one-file-per-object on disk (great for maybe 90% of use-cases, but not billions of thumbnails). Ceph et al are quite complex. There are a bunch of almost-sort-kinda solutions like base64-encoded bytes in HBase/postgresql/etc, or chunking (like MongoDB), but really you just want to concatenate the bytes like a .tar file, and index in with range requests.

The Wayback Machine's WARC files plus CDX (index files with offset/range) is pretty close.

Why Not RDF in the at Protocol?

bnewbold — Fri, 19 Jan 2024 03:48:01 +0000

Article URL: https://www.pfrazee.com/blog/why-not-rdf

Comments URL: https://news.ycombinator.com/item?id=39051363

Points: 4

# Comments: 0

New comment by bnewbold in "Where do journals go to die?"

bnewbold — Wed, 13 Dec 2023 05:27:45 +0000

This is exactly the problem that the Internet Archive created their Scholar project to mitigate (https://scholar.archive.org/about). The https://fatcat.wiki component acts as a dashboard to track preservation of scholarly publications across multiple efforts. There are a bunch of projects in this area, including LOCKSS ("lots of copies keep stuff safe", including some fun/novel uses of cryptography), Scielo and similar regional platforms and archives (primarily outside the US/EU), Pubmed Central, etc. Zenodo (CERN) and figshare end up being an accessible option for some small journals. There are definitely gaps that content falls through and gets lost.

A few folks have mentioned shadow libraries like Sci-Hub. These efforts can play an archival role, but tend to focus on access, which means there is not as much attention on content which is freely available today, but could disappear in the future.

A common dynamic here is that clout and funding flows to globally prestigious publications, and there is a bias against marginal publications. For sure there are many content-farms and scammy publications, but a lot of gems and valuable small publications get bundled in and dismissed.

Some more references and reading for folks interested in this subject: https://guide.fatcat.wiki/bibliography.html

(I helped create Scholar, but don't currently work at Internet Archive)

New comment by bnewbold in "Dot Dotat.at"

Fri, 06 Oct 2023 01:36:27 +0000

not as satisfying as a friend's " at at dot dot see see"

New comment by bnewbold in "Trafilatura: Python tool to gather text on the Web"

bnewbold — Mon, 14 Aug 2023 22:47:39 +0000

This tool is so great for robustly dealing with content in old and poorly formatted HTML. There are a lot of similar tools for extracting "the main text" from free-form HTML, but this was the most reliable in my experience, especially when dealing with web archives containing hand-written HTML back to the 1990s, working with non-English languages, etc.

New comment by bnewbold in "I downloaded all 1.6M posts on Bluesky"

bnewbold — Sun, 07 May 2023 19:29:11 +0000

all public account content is in a "repo", commits to the repo are signed, and the identity resolution mechanism gives anybody the current/active signing key.

the most direct analogy is to signed git commits. this is an intentional design decision compared to signing individual messages/posts/etc. A "proof" for a single record in the repo is the commit, the record, and the chain of merkle tree nodes connecting the two.

New comment by bnewbold in "I downloaded all 1.6M posts on Bluesky"

bnewbold — Sat, 06 May 2023 22:26:08 +0000

some quick thoughts/notes (I am on the bluesky team, but this isn't an official policy statement):

- content on bluesky is public, but we have not set expectations/comms around that well yet, and this dump may be a surprise to some existing accounts. where exactly bluesky falls on the spectrum from "congressional register (immutable)" to "public web" to "public IRC or discord room" to "private signal group" is still being worked out, but probably closest to "public web"

- the protocol supports both "deletions" (retaining history), and "purge" (aka "rebase") to remove all not-current content. this isn't exposed via UI yet and accounts have not had the chance to purge old deletions

- the federation protocol and unified firehose should make it possible for third parties to maintain a live mirror of the entire corpus. importantly, it will be easy (or at least "easier") to respect intents w/r/t deletions when done this way, compared to dumps

- obviously neither "deletion" nor "purge" can perfectly remove content from 3rd party dumps and infra, or from hostile parties. but it does signal user intent clearly, and we expect as a norm that third parties will respect that intent. ADS-B, robots.txt, CC licensing are related to these norms, though all unique. right-to-be-forgotten, archiving, re-use licensing, use in ML training, commercial/non-profit reuse, search indexing, etc, are all on our radar

- blobs/images are not included in this corpus

- this specific corpus does not (I assume) include our important "label" moderation metadata. at least for our (Bluesky) core moderation decisions, that information will be public

- private/group content is not yet part of protocol. eg, no built-in mechanism for DMs or follower-only posts. we will probably do those eventually, but it will be basically a whole separate protocol, not a bolt-on to existing stuff. wildly different privacy/security concerns with non-public content

- there are some other cool projects, like https://bsky.jazco.dev/, working with the full social graph, pulled via public API

New comment by bnewbold in "How to set your domain as your handle"

bnewbold — Sat, 29 Apr 2023 05:58:07 +0000

dropped domains will be a bit of a UX problem.

the human root of identity is the handle (memorable/recognizable), but the real identity root for the account is a DID. which is a pretty open/wild spec, so we mostly use did:plc ("placeholder"), which is a self-authenticating tricky tricky.

withing the protocol and applications, everything under the hood works via DID references, not handle references. a domain handle works by pointing at a DID, but does not control the actual DID ("DID document"). so any old users/followers/href will still be attached to the "old account". and it would be possible for the old account to recovery and set up a new handle.

but the superficial bits (anchor text), and human identity for new lookups, are attached to the handle, and could get pointed to a new DID, or just not resolve. that would be messy.

New comment by bnewbold in "How to set your domain as your handle"

bnewbold — Sat, 29 Apr 2023 05:53:12 +0000

You can point your web browser to https://staging.bsky.app

The team is too small to develop and maintain even an electron desktop app or something like that wrapping the existing react-native-web SPA. But this leaves space for fancy third-party apps, or integration into multi-protocol desktop apps, as needed.