<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: robustcollector</title><link>https://news.ycombinator.com/user?id=robustcollector</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 17 Apr 2026 17:56:21 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=robustcollector" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by robustcollector in "Inside the "3 billion people" national public data breach"]]></title><description><![CDATA[
<p>Elsewhere in this thread I posted a detailed commentary on what the torrent contains.</p>
]]></description><pubDate>Fri, 16 Aug 2024 11:32:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=41265363</link><dc:creator>robustcollector</dc:creator><comments>https://news.ycombinator.com/item?id=41265363</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41265363</guid></item><item><title><![CDATA[New comment by robustcollector in "Inside the "3 billion people" national public data breach"]]></title><description><![CDATA[
<p>Perhaps HN readers would appreciate a detailed account of what the NPD torrents contain.<p>The torrent deliver two files like so:<p><pre><code>  NPD202401.7z  33,456,912,010 bytes (32GB)
  NPD202402.7z  20,548,499,322 bytes (20GB)
</code></pre>
Uncompressing NPD202401.7z results in:<p><pre><code>  ssn.txt 176,806,109,779 bytes (165GB)
  wc -l ssn.txt ==>> 1,698,302,005 lines
</code></pre>
Uncompressing NPD202402.7z results in:<p><pre><code>  ssn2.txt 120,722,361,611 bytes (113GB)
  wc -l ssn2.txt ==>> 997,379,508 lines
</code></pre>
This is a total of 1698302005+997379508 = 2,695,681,513 lines.<p>Each line is a comma separated record with these fields:<p>ID,firstname,lastname,middlename,name_suff,dob,address,city,county_name,st,zip,phone1,aka1fullname,aka2fullname,aka3fullname,StartDat,alt1DOB,alt2DOB,alt3DOB,ssn<p>Generally records have ID, firstname, lastname, middlename, address, city, county_name, st, zip, and ssn. Most records do not have the fields for name_suff (name suffix), phone1, aka1fullname, aka2fullname, aka3fullname, StartDat, alt1DOB, alt2DOB, and alt3DOB.<p>There are no emails at all. There is no "@" in the files anywhere. Phone numbers are very rare.<p>I don't know what the ID number at the head of each line represents. I presume it is an internal index used by the organization that compiled the data. The SSN is at the end of each line.<p>The files have U.S. addresses only as far as I can tell. Nothing from Mexico, Canada, or other foreign countries.<p>Many of the lines (records) concern the same person at various addresses. Of 7 random people who I personally know that I checked on, all had entries. There were between 3 and 20 lines (records) for these 7 persons, averaging about 10. They usually differed only in the address field. Going by an estimate of 10 records per person, the 2.6 billion lines represents about 2695681513/10 = 269,568,151 distinct persons in the U.S.<p>The U.S. population is about 337M where 78% is over 18 years of age. In other words, 337000000*0.78 = 262,860,000 Americans are adults. This is pretty close to my estimate of 269,568,151 distinct individuals in the NPD data files.<p>Of the 7 persons I checked on, the names were spelled correctly, although the middle name was sometimes just an initial. I searched each person by multiple methods (address, last name, birth date) so I believe I would have detected names that were spelled slightly wrong.<p>The addresses appeared correct but there was no way to tell which was the current address and the order in which they lived at each address. There is a StartDat field but it was almost never filled in. The latest entry was not always the most current address. In a couple cases, the current address, where the person has been living for several years, was absent.<p>The birth dates were correct in a couple cases, were abbreviated in three cases (that is, instead of showing 19800704, meaning July 4 1980, it showed 19800700, meaning July 1980 without an exact day), and was wrong for one person by a wide margin.<p>All 7 persons I checked had SSN numbers. It was correct for 1 person but I don't know for the other 6. The SSN numbers were consistent for each of the 7 persons I checked on. By this I mean that a person did not have more than 1 SSN number, at least among the 7 persons I checked on.</p>
]]></description><pubDate>Fri, 16 Aug 2024 11:21:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=41265287</link><dc:creator>robustcollector</dc:creator><comments>https://news.ycombinator.com/item?id=41265287</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41265287</guid></item></channel></rss>