Data migration from NESSTAR to Dataverse and the progress so far!

INTRODUCTION

New Zealand Social Science Data Service(NZSSDS) administered and maintained by COMPASS and is built around an architecture based on Australian Data Archive (ADA) and NESSTAR (a proprietary middleware). NZSSDS is a multi-functional entity. It provides:

  • Space for holding data sets and metadata related to social sciences surveys in New Zealand
  • Enhanced publications, adding value to journal articles and other publications around the surveys held
  • Resources to support research methods teaching, including SPSS guidebooks written around teaching subsets of some of the surveys held in the archive.

The reason to migrate from NESSTAR is that it is not an open-source free data service anymore and they have started paid subscription of $2500 from this year which is not feasible as there are better options like Dataverse available as an open-source platform.

AIM OF THE PROJECT

To move NZSSDS Data service NESSTAR to an open source architecture DATAVERSE developed by IQSS Dept. of Harvard University.

DATAVERSE NETWORK PROJECT

The Dataverse Network is an open-source application for publishing, referencing, extracting and analyzing research data.

The main goal of the Dataverse Network is to solve the problems of data sharing through building technologies that enable institutions to reduce the burden for researchers and data publishers, and incentivize them to share their data.

By installing Dataverse Network software, an institution is able to host multiple individual virtual archives, called "dataverses" for scholars, research groups, or journals, providing a data publication framework that supports author recognition, persistent citation, data discovery and preservation.

Dataverses require no hardware or software costs, nor maintenance or backups by the data owner, but still enable all web visibility and credit to devolve to the data owner.

WHY USE DATAVERSE?

  • Format Conversion and Fixity
    The UNF helps verify permanently that the data are fixed and unchanged from the data originally used by the author.
  • Restricted Access and protection of confidential data to be stored( Good security policies in place).
  • Data Discovery
    helps researchers find and easily access small data sets from other researchers that would otherwise sit in local computers with the risk of being lost.
  • Easy to Use and Maintain
    data owners can administer all the settings and manage studies through a web interface

TWO WAYS TO IMPLEMENT IT!

By creating a DATAVERSE in a DVN:

One way is to create a dataverse in IQSS dataverse website of Harvard University.

By uploading all the information and data sets we currently have into dataverse.

Advantages: No need to create own Dataverse network, no need to maintain and administrate the storage and archiving of data, All data can be individually verified and cited using UNF, cheaper and faster, no staffing needed.

Disadvantages: No control over the data, i.e. if the main Dataverse network website goes down then our data on that website will go down as well.

By creating a Dataverse Network (DVN):

Creating a DVN which includes extensive hardware and software requirements, only practical if used as a university-wide application.

Advantages: Each department, faculty, professors and students can create, access, edit, store and share there research data with each other from there own individual Dataverses, All existing DVN’S such as Harvard Uni. IQSS etc. can access our DVN as well which will increase exposure of our research and the material related to it.

It will create a University of Auckland hub of research including all the data created by staff, professors and students.

Libraries of other universities such as University of Toronto use DVN network to store and share there data to other universities for research purpose.

Disadvantages: Hardware and software requirements needed (explained in the next slide), Needs staffing to maintain and administrate the DVN.

REQUIREMENTS TO CREATE DVN/DATAVERSE:

  • Software based: Linux based OS, Netbeans 7.0.1, Glassfish 3.1.2, Dataverse latest version software, Virtual server etc.
  • Hardware based: Workstation which can support Linux OS, Netbeans IDE and Dataverse.
  • Knowledge of concepts: Learning how to create, edit and use Dataverse, Virtual servers, Netbeans, glassfish, Java, Working on creating and installation of DVN, postgreSQL, Data migration.

TASKS NEEDED TO BE DONE

  • Learning about Dataverse/ DVN, how create, maintain and administrate dataverses/ DVN. (Currently in progress)
  • Analysing and backing up the actual data to be migrated. (Currently in progress)
  • Installing the DVN if needed, learning and getting to know about Linux commands, familiarity with Netbeans, virtual servers and Java programming.
  • Creating the DVN using virtual server if needed.
  • Uploading the actual data on Dataverse/ DVN.
  • Linking DVN/ Dataverse to the main NZSSDS website.
  • Adding plug-ins like R for data analysis if needed.
Submitted by Shubham Sharma on