In the event's opening speech , Dr George Slim of MoRST talked about how, in the future, all research will be eresearch, and the need to start dealing with it now lest we all face problems with it in the future. MoRST has been involved with the field over the last few years, both with 'easy wins' such as video conferencing and more under-the-radar work such as data access management. He also talked about problems such as unpublished data languishing in archives and desk drawers, rather than being part of the knowledge base of science.
eResearch Symposium 2010 Presentations and Recordings
Research data is increasingly becoming important in its own right, not just as the means to deriving a publication. We have been dealing with the data deluge since the turn of the millennium, and the scale of the challenges continue to increase. This presentation will review how we got to where we are today, looking at the pivotal role of data and data management in the history of communication. It will then move to consider the present role of data in scholarly communication by examining a range of problems in the published literature. It will conclude by examining some of the initiatives being taken to start to fix the future of data, and the sorts of services and approaches that will be required.
The data challenges in the environmental sciences lie in discovering relevant data, dealing with extreme data heterogeneity, and converting data to information and knowledge. Addressing these challenges requires new approaches for managing, preserving, analyzing, and sharing data. In this talk, I introduce several environmental science challenges and relate those to current cyberinfrastructure challenges. Second, I introduce DataONE (Data Observation Network for Earth), which represents a new virtual organization that will enable new science and knowledge creation through universal access to data about life on earth and the environment that sustains it. DataONE encompasses a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data. Third, I conclude by presenting several opportunities for international collaboration in the environmental sciences and cyberinfrastructure areas.
Most commentators agree that the only way forward for New Zealand is to forge a high-productivity knowledge-based economy. However, in the late twentieth and early twenty-first centuries, it is the large global cities that have driven innovation and the generation of knowledge. If New Zealand is to take the high productivity path, it must overcome its geographical isolation and low population density by learning to act like a city of four million people. In this talk, I will discuss the nature and magnitude of this challenge by looking quantitatively at innovation and the generation of knowledge around the world. I will discuss how eResearch will play an essential role in building scale and collaboration within New Zealand and in extending the Kiwi knowledge network around the world.
Cancer is Australia's largest disease burden, and arises as from the accumulation of genetic damage. Typically cancers accumulate multiple mutations, and these will vary from one cancer type to another, from person to person, and may even vary between different tumour sites in the same person. This variation could mean the best treatment for one patient might have no effect for another, or that a treatment that worked in the past might have no effect upon on a cancer relapse. The ultimate dream for cancer patients would be to determine exactly what mutations caused the disease, and exactly what treatments would work the best - a concept known as personalized medical genomics. Although conceptually simple, collecting, storing, and analysing the large scale biological data generated as part of medical genomics studies represents a huge informatics challenge - eclipsed only by the challenge of integrating this data with existing biological resources and knowledge.
There are about 1.9 million species described on Earth, with several times this number of species names; including common names, misspellings, and multiple scientific names applied to the same species. This knowledge may represent only half of all species on Earth. No single person can be knowledgeable about more than a fraction of this number, necessitating the need for hundreds of experts to quality control nomenclature in global biodiversity. Thousands of experts are required to expand the biodiversity content into ecology, physiology, and other areas of biology. In turn their knowledge builds on millions of publications over four centuries. The past decade has seen the emergence of open-access online biodiversity databases providing authoritative information on species taxonomy (e.g. Species 2000, World Register of Marine Species), information on introduced pest species (e.g. Global Invasive Species Database, Delivering Alien Invasive Species Information for Europe), and data on the geographic distribution of species (e.g. Global Biodiversity Information Facility, Ocean Biogeographic Information System).
Here, we provide examples of how these databases can now be used to conduct world-scale studies on biodiversity with and without modelling techniques. We then propose that these databases must work more closely together to (a) facilitate data quality control, (b) provide a more comprehensive (complete) and integrated biodiversity resource that is of more value to researchers, and (c) make most efficient use of the limited pool of scientific expertise. This synergy in infrastructures may be achieved in parallel with engagement of more experts, greater recognition of contributing individuals, institutions and funding agencies, and result in more substantial and prestigious global databases that provide services from national to global scales.
This presentation will explain how social simulation was used to supplement the traditional statistical analysis in examining inter-ethnic partnership patterns over the period 1981–2006 using Census microdata. It will then discuss how the BeSTGRID cluster was used for the parallel processing of the simulation model, in order to use an evolutionary optimisation algorithm to search for optimal combinations of the partnering parameters.
The project was a Marsden funded study that investigated changes in the social structure of New Zealand by examining patterns of inter-ethnic partnering in married and de-facto relationships. The two main components of this study were a series of log-linear models examining the existing ethnic patterns, and a social simulation model of partnership formation that was populated with unit-level data from the New Zealand Census.
The simulation was written in Java and run on the Auckland cluster of the BeSTGRID computer network (www.bestgrid.org). The processing power of the cluster allowed the simulation to be run at a city level, with unit-level data that provided demographic information for all of the single eighteen to thirty year olds listed in the census in the Auckland, Wellington and Canterbury regions.
For twelve hours on 9 May 2010, a combination of six radio telescopes in Australia and New Zealand (including the first SKA dish in Western Australia and the AUT radio telescope in Warkworth), observed the core of the radio galaxy Centaurus A. A few weeks earlier the same set of Australian and NZ radio telescopes successfully observed an active galactic nucleus with a supermassive black hole and relativistic jet structure (PKS 1934-638). Both Centaurus A and PKS 1934-638 are the objects of greatest scientific interest. Following the installation of the KAREN connection at the AUT radio telescope, Warkworth data was transferred to Western Australia, where it was correlated, calibrated and imaged. The main objective of this activity was to virtually create a “skeleton” of the Australasian SKA to demonstrate the advantage of the 5500 km baseline and provide the first science from this Australasian SKA “prototype”. It was achieved on time and resulted in significant science return. Warkworth is now available to the international radio astronomy community for the VLBI (very long baseline interferometry) and its real-time eResearch version, eVLBI, as a part of the Australian Long Baseline Array. Challenges and future plans for this exciting international eResearch are outlined.
Deep engagement between biologists, clinicians and computational experts greatly increases the amount of biological understanding we can gain from high-- content human data. In an example of this approach, we have performed a meta-- analysis of breast cancer microarray data from around the world. Several novel analytic methods were applied to this data set, most of which would not be feasible without the use of collaboration tools, grid computing and high performance computing.
Drs Lance Miller (Wake Forest University, USA), Anita Muthukaruppan (University of Auckland) and Mik Black (University of Otago) assembled and annotated a collection of Affymetrix microarray datasets comprising breast tumours from 950 women (including NZ women) with extensive clinical details. This data set was large enough to allow studies of small subgroups of breast cancer patients that previously we could not explore with any degree of statistical power. Four novel analysis methods were developed in collaboration with clinicians and applied to this data set, to answer key questions about breast cancer:
1) What transcription factors in tumours are most relevant to the survival of breast cancer patients?
2) What gene networks are active in breast cancer patients?
3) What clinically significant genes are amplified in breast cancer?
4) What genes modulate transcription factor activity in breast cancer?
The use of these novel, computationally intensive methods to analyse a large clinical data set provides a good example of the power of generating a small collaborative eResearch community focused on a specific problem. With publicly available collections of clinical and molecular data continuing to grow rapidly, there are tremendous opportunities for biological discovery using approaches such as those outlined here: where eResearch tools are essential for this work.
The classical approach to drug discovery and development is to test large collections of chemical compounds for therapeutic activity in "Wet Labs" in solution within biological assays that report on a disease specific target. Once compounds active in the assays, "hits", have been identified a medicinal chemistry programme is initiated that explores the chemical space around a molecule by making a large series of directly related compounds, known as analogues. It is the resulting chemical structure and biological activity relationship that identifies the drug lead. This is known as hit to lead development, and while this phase can be accomplished within an academic setting, engaging in hit discovery is much less accessible, limited by the absence of High Throughput Chemical Screening facilities in New Zealand, and the cost of accessing Australian facilities.
When the 3-dimensional structure of a target is known at atomic resolution, it is possible to use this information to screen digital libraries of compounds by matching computed physico-chemical properties - this process is known as virtual screening. This virtual screening process is a digital equivalent to the High Throughput Chemical Screening approach, and with low setup costs represents a more readily accessed discovery platform capable of stimulating wet lab drug discovery within an academic setting by identifying small numbers of likely active compounds.
Within some of the drug discovery and development programmes at the Auckland Cancer Society Research Centre, virtual screening has proved useful for new "hit" discovery where the atomic structure of the target was known, with one screen taking approximately 7 weeks to complete on a desktop machine using 1 cpu. In a collaboration between the Auckland Cancer Society Research Centre and the Centre for eResearch, we are building on this discovery success by using the high performance computing environment provided by BeSTGRID to develop a large scale virtual screening environment based on the Grisu framework that will facilitate an increase in hit discovery performance in the University of Auckland environment.
Early results are showing significant improvements in time to discovery, with the current increases in scale of computing environments leading to a 7x speed up in analysis run times. Moreover, it has facilitated a change in the use of virtual screening, to include concurrent focussed screening around specific chemical features of hit compounds. Our current plans further increase the scale of analysis possible, both in terms of the digital libraries of compounds and with respect to the number of projects enabled by both Grisu and the scientific technology.