Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually, these days, most scientists who are at the forefront of computational biology have mastered github. They use markdown for README files, and produce md5summed data sets and command lines so you can repro their work. From what I can tell, this started 3-5 years ago when the kids coming into grad school picked up what the open source hackers were doing.

To be clear: none of this is properly incentivized by the funders, so there's not much selection for this behavior.



What use is an MD5 of dataset if its semantics should be clear from table and column names?


Fileset checksumming addresses a different concern from the semantic content of the data. It's merely a way to ensure that two people send the exact same input files to a routine. It's a critical control.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: