dat is a git for data, a way to install a data set, get a specific version of a data set, and publish that data set. And the stipulation? The data set size can be unlimited. Funded by the Sloan Foundation, dat is an open source tool that seeks to enable collaboration workflows on top of dataset.
In the talk below, Max Ogden, the creator of the dat project, presents “All About Dat Data,” where he discusses the project in more detail, and gives a live demonstration of its capabilities.
The high level goal of the Dat project is to make it easier to work with large scientific datasets in an automated way, which both saves time and also makes reproducibility easier.
The core dat tool is a streaming dataset versioning + replication system developed with a heavy Unix philosophy designed to encourage extreme modularity and enable many third party applications to be built on top.
In addition to the core tool we are also developing tools for building and distributing streaming, cross platform data pipelines based on Node.js and Docker.
Looking for something else? Check out our full library of PubNub talks here.