A "Science Distribution Network" -- Hadoop/ownCloud synchronised across the Tasman

The introduction of cloud file storage and synchronization services on a retail basis – best exemplified by DropBox – is changing researchers’ behaviour around their datasets, posing both an opportunity and a threat. It is increasing appreciation of data networking issues, such as transfer performance, amongst a researcher community who don’t typically engage with eResearch providers. However, it is causing a movement of intellectual property, with the data, from R&E-operated data stores into commercial foreign-jurisdiction platforms.

In seizing the opportunity of increased researcher awareness, AARNet have begun deploying a system that better suits the collaborative nature in which teams operate. It is based on a layering of (open-source platforms): Hadoop, for distributed filesystem replication, and ownCloud, for presentation and synchronization of data. Replication would be tuned both a priori , tagging data-sets to make them replicate to nodes close to identified members of a team, and on-demand – if a casual user started to access the data, then the whole data-set could be transferred in the background, so to reduce latency of further access.

The logical next step is to expand the synchronizing fabric across geographic and policy borders. New Zealand eScience Infrastructure (for storage) and REANNZ (for network capacity) are collaborating with AARNet to tie-in new hadoop/owncloud nodes in NZ to existing nodes of the AARNet system. Such a set-up promises to significantly reduce latency for NZ-Oz cross-border collaborations, allowing researchers to access data from their closest store, made available through project-specific replication of data. This mode of operation is not unlike the operation of “Content Distribution Networks” such as Akamai; it dovetails nicely with active research in NZ on software-defined networking (e.g., with OpenFlow) where, in this case, replication could be scheduled with the underlying network.

Our approach engenders numerous possibilities for improving the model by which collaborating researchers share data.

eResearch NZ 2013 session type: 


Submitted by Tim McNamara on