Actually, these days, most scientists who are at the forefront of computational biology have mastered github. They use markdown for README files, and produce md5summed data sets and command lines so you can repro their work. From what I can tell, this started 3-5 years ago when the kids coming into grad school picked up what the open source hackers were doing.
To be clear: none of this is properly incentivized by the funders, so there's not much selection for this behavior.
Fileset checksumming addresses a different concern from the semantic content of the data. It's merely a way to ensure that two people send the exact same input files to a routine. It's a critical control.
To be clear: none of this is properly incentivized by the funders, so there's not much selection for this behavior.