In the event's opening speech , Dr George Slim of MoRST talked about how, in the future, all research will be eresearch, and the need to start dealing with it now lest we all face problems with it in the future. MoRST has been involved with the field over the last few years, both with 'easy wins' such as video conferencing and more under-the-radar work such as data access management. He also talked about problems such as unpublished data languishing in archives and desk drawers, rather than being part of the knowledge base of science.
eResearch Symposium 2010 Presentations and Recordings
Research data is increasingly becoming important in its own right, not just as the means to deriving a publication. We have been dealing with the data deluge since the turn of the millennium, and the scale of the challenges continue to increase. This presentation will review how we got to where we are today, looking at the pivotal role of data and data management in the history of communication. It will then move to consider the present role of data in scholarly communication by examining a range of problems in the published literature. It will conclude by examining some of the initiatives being taken to start to fix the future of data, and the sorts of services and approaches that will be required.
The data challenges in the environmental sciences lie in discovering relevant data, dealing with extreme data heterogeneity, and converting data to information and knowledge. Addressing these challenges requires new approaches for managing, preserving, analyzing, and sharing data. In this talk, I introduce several environmental science challenges and relate those to current cyberinfrastructure challenges. Second, I introduce DataONE (Data Observation Network for Earth), which represents a new virtual organization that will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. DataONE encompasses a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Third, I conclude by presenting several opportunities for international collaboration in the environmental sciences and cyberinfrastructure areas.
Most commentators agree that the only way forward for New Zealand is to forge a high-productivity knowledge-based economy. However, in the late twentieth and early twenty-first centuries, it is the large global cities that have driven innovation and the generation of knowledge. If New Zealand is to take the high productivity path, it must overcome its geographical isolation and low population density by learning to act like a city of four million people. In this talk, I will discuss the nature and magnitude of this challenge by looking quantitatively at innovation and the generation of knowledge around the world. I will discuss how eResearch will play an essential role in building scale and collaboration within New Zealand and in extending the Kiwi knowledge network around the world.
Cancer is Australia's largest disease burden, and arises as from the accumulation of genetic damage. Typically cancers accumulate multiple mutations, and these will vary from one cancer type to another, from person to person, and may even vary between different tumour sites in the same person. This variation could mean the best treatment for one patient might have no effect for another, or that a treatment that worked in the past might have no effect upon on a cancer relapse. The ultimate dream for cancer patients would be to determine exactly what mutations caused the disease, and exactly what treatments would work the best - a concept known as personalized medical genomics. Although conceptually simple, collecting, storing, and analysing the large scale biological data generated as part of medical genomics studies represents a huge informatics challenge - eclipsed only by the challenge of integrating this data with existing biological resources and knowledge.
Biomedical problems are often found in complex environments consisting of vast numbers of interrelated components at multiple temporal and spatial scales. The exponentially expanding number of possible interactions renders the development of understanding of many phenomena impossible through conventional experimentation alone. Systems Biology offers an alternative approach using computational analysis of quantitative models in order to discern the causes and effects of emergent behaviour in complex systems. Adopting this strategy of enquiry, the limiting factor for analysis leading to insight generation becomes one of available computer processing cycles. At present, our science is artificially limited to the questions we can tractably pursue in the computer time available.
BeSTGRID, providing online access to distributed computer resource, represents a significant scientific opportunity. Here we have developed a prototype model analysis software tool and user interface for developers of quantitative biomedical models. The tool understands the model-exchange protocol CellML, a proven format for Systems and Synthetic Biology. Each model is marked up with metadata providing simulation instructions, and instances of the model, each parameterised with pre-determined parameter sets, are scheduled on BeSTGRID via a Java-based user interface built using the Grisu framework. These BeSTGRID jobs are then computed using the open-source CellML Simulator, before time-course results are compiled and returned to a user-accessible area.
To verify the applicability of this prototype, we conducted a duplicate of the computationally expensive portion of the analysis of a significant intracellular signalling system in cardiac myocytes - previously performed using more limited computational resources. Briefly, a model of a signal transduction pathway was analysed to determine the molecular species and reactions most significant to cells’ production of a key signalling molecule (inositol 1,4,5-trisphosphate). Testable parameter sets were determined via the Morris method as previously, and BeSTGRID jobs scheduled to perform the ~600,000 model simulations required. As an indicator, the simulation computation was completed in less than 50 hours real-time on BeSTGRID, as opposed to 2 weeks using the high performance computing platform in the original study.
The ~6.7-fold decrease in the time taken to generate results for signalling research scientist enables the examination of systems of greater complexity. Where it was possible to analyse one pathway previously, we can now examine the behaviour of a network of 6 or 7 interacting pathways. Moreover, since we have used a generic model format and analysis technology, research speed increases here are applicable not just to signalling but any biomedical research model that can be expressed in CellML, including tissue- and organ-level work.
The increase in computational power will now enable additional innovations in model simulation technology. Our future work includes developing the simulation software and associated metadata specifications to include more flexible specification of the simulations and post-processing. This will facilitate the examining of not just one cellular activity at a time, but life-cycles of biological entities such as cells, tissues, organs or complete organisms. This is essential to the development of meaningful biomedical insights into the functioning of whole systems of life.
These service all form part of the VBL (Virtual Beamline) program of VeRSI and collaborations from the Australian Synchrotron, La Trobe University, Bragg at ANSTO, Monash University, ANU and CSIRO.
Within this talk we will show case (with a live demo) our remote experimentation environment to access high value research asset’s within the Australian Synchrotron and La Trobe University demonstrating remote control, collaboration, oversight Data transfer and metadata extraction technologies.
We will also show examples of metadata extraction using the MetaMan service across multiple instruments (The Australian Synchrotrons IR Beamline, MX1/2 Beamline, Powder diffraction Beamline and La Trobe Universities XPS instrument) data file formats and integration within the protein crystallography disciple to MyTARDIS (developed by Monash University) and the distribution of the data and metadata across its federated nodes. MetaMan can also be used to rescale and send image information to our high resolution multi screen display.
We will also demonstrate remote analysis of proprietary Bruker spectra taken on the Infrared Beamline at the Australian synchrotron using NoMachine NX remote access solutions to execute Bruker’s Opus software and also to execute ANU’s Drishti Software to visualize CT reconstructions taken from the Australian Synchrotron’s Medical Imaging Beamline that has been processed by CSIRO’s XLI software on the MASSIVE 0 test cluster.
All these tools, services and techniques have been integrated within or to work with in collaboration with the VBL remote research environment.
A divide exists between researchers and support services. Yet support services have potential to positively transform and enhance the Research/Project experience. In a research landscape shaped by the information era, this is particularly true. We produce more data and information than we can manage or use and we have new expectations about how we should or can work.
1. Researchers work across discipline boundaries and manage complex multi-disciplinary and multi-institution projects including the complexity and quantity of data and information they generate;
2. Researchers collaborate, share ideas and information, create synergies for knowledge production and create new kinds of knowledge;
3. Researchers need administrative and management platforms of greater power and sophistication to support and facilitate new research enterprises, and the potentially paradigm changing approaches they are inventing.
The Coordinated Services Initiative (CSI) is a new approach being piloted at Massey University. It aims to create a relationship between researchers and support services which benefits research project development capability and capacity. CSI offers coordinated support that improves upon the existing silo based provision of core services, facilitates open communication between research academics and support services, and creates scope for joint project collaboration at concept stage and beyond. The coordinated service team includes (as required), Research Management Services (RMS), Information Technology Services (ITS), Library, Graduate Research School (GRS), Ethics, People and Organisational Development (POD), Centre for Academic Development and eLearning (CADeL) and Marketing.
The CSI approach is being piloted across two distinct projects.
(i) The Global Entrepreneurial Leadership (GEL) project to deliver training and professional development, work with stakeholder groups to create research and development opportunities and work alongside communities of professional practice.
(ii) The ‘Manawatu Our Region Our River’ project engages Massey University in community-based research linked to our region
This presentation documents and shares the CSI story so far.
The Australian National Data Service (ANDS) is a centralised repository containing inter-linked records of public funded data, researchers and grants. It’s purpose is to promote data reuse and collaboration by making general descriptions of that data available to other Researchers.
The Monash e-Research Centre (MeRC) core competency is to provide the "bridge", the "link" or glue which facilitates between Researchers, ITS Service Division, Monash Library, Faculty of IT and DVC Research; and to provide the added value to ensure that the ICT Solution chosen meets the Researchers requirements. MeRC’s key areas are Collaboration Services, High Performance Computing, Data Storage and Management and Visualisation Services.
ANDS has funded the Monash University Data Capture and Metdata Store Program and MeRC has identified 8 areas that can benefit from the ANDS program of work:
* Climate and weather - Storage and Sharing of data.
* Ecosystem measurements - Storage and distribution of CO2 data
* Molecular biology - Storage and distribution of x-ray crystallography
* Multimedia collections and ARROW - Publishing multimedia collections
* History of Adoption - Automation & publication of data
* Interferome - Integration and analysis of data
* Microscopy - Storage and processing of data
* General Metadata Store Infrastructure
Each project can be classified under the following digital curation model:
* Data Capture e.g. Instrument, experiment, raw data or processed data
* Data Management e.g. store, retrieval, annotate, search etc..
* Data Re-Use e.g. linking other experiments if different disciplines e.g. ANDS Service
The project management method chosen to manage these projects is through PMBOK and the software management method used is AGILE/SCRUM. Seven of the eight projects focus on specific client interactions while the eighth is a general case which a generic solution will be developed for most disciplines. Projects chosen are diverse in nature and offer examples of how research data can be digitally collected, recorded and catalogued thus demonstrating how researchers can benefit from better organised and annotated data.
There are about 1.9 million species described on Earth, with several times this number of species names; including common names, misspellings, and multiple scientific names applied to the same species. This knowledge may represent only half of all species on Earth. No single person can be knowledgeable about more than a fraction of this number, necessitating the need for hundreds of experts to quality control nomenclature in global biodiversity. Thousands of experts are required to expand the biodiversity content into ecology, physiology, and other areas of biology. In turn their knowledge builds on millions of publications over four centuries. The past decade has seen the emergence of open-access online biodiversity databases providing authoritative information on species taxonomy (e.g. Species 2000, World Register of Marine Species), information on introduced pest species (e.g. Global Invasive Species Database, Delivering Alien Invasive Species Information for Europe), and data on the geographic distribution of species (e.g. Global Biodiversity Information Facility, Ocean Biogeographic Information System).
Here, we provide examples of how these databases can now be used to conduct world-scale studies on biodiversity with and without modelling techniques. We then propose that these databases must work more closely together to (a) facilitate data quality control, (b) provide a more comprehensive (complete) and integrated biodiversity resource that is of more value to researchers, and (c) make most efficient use of the limited pool of scientific expertise. This synergy in infrastructures may be achieved in parallel with engagement of more experts, greater recognition of contributing individuals, institutions and funding agencies, and result in more substantial and prestigious global databases that provide services from national to global scales.