Wired has a fascinating story about coders who are backing up NASA climate data and keeping tabs on changes to U.S. government websites:
Like similar groups across the country—in more than 20 cities—they believe that the Trump administration might want to disappear this data down a memory hole. So these hackers, scientists, and students are collecting it to save outside government servers.
But now they’re going even further. Groups like DataRefuge and the Environmental Data and Governance Initiative, which organized the Berkeley hackathon to collect data from NASA’s earth sciences programs and the Department of Energy, are doing more than archiving. Diehard coders are building robust systems to monitor ongoing changes to government websites. And they’re keeping track of what’s already been removed—because yes, the pruning has already begun….
One coder who goes by Tek ran into a wall trying to download multi-satellite precipitation data from NASA’s Goddard Space Flight Center. Starting in August, access to Goddard Earth Science Data required a login. But with a bit of totally legal digging around the site (DataRefuge prohibits outright hacking), Tek found a buried link to the old FTP server. He clicked and started downloading. By the end of the day he had data for all of 2016 and some of 2015. It would take at least another 24 hours to finish.
The non-coders hit dead-ends too. Throughout the morning they racked up “404 Page not found” errors across NASA’s Earth Observing System website. And they more than once ran across databases that had already been emptied out, like the Global Change Data Center’s reports archive and one of NASA’s atmospheric CO2 datasets.
And this is where the real problem lies. They can’t be sure when this data disappeared (or if anyone backed it up first). Scientists who understand it better will have to go back and take a look. But meantime, DataRefuge and EDGI understand that they need to be monitoring those changes and deletions. That’s more work than a human could do.
Rad the full story.