The final frontier is here and its dressed up in a web framework for biomedical research named Galaxy. As we stand at the crossroads between 2012/2013 the world is more technologically orientated and biomedical research is no different. In genetics (something I am familiar with), common procedure for a genetics experiment now contain no ‘wet-lab’ work. Data from publicly accessible databases adds up into the petabytes ( big guess! but its a lot) and there is still value in the data with many new discoveries lurking in the 1s and 0s.
So how does a Biomedical researcher in 2012 spend his valuable time? Less time with PCR(polymerase chain reaction) and Agar plates and more time sitting in front of a screen very similar to the screen I am writing this blogpost from. The data needs to be analysed and processed usually from a terminal. Science needs to be reproducible and your bash scripts that are unusable to others are not really going to cut it. Galaxy provides a framework for command-line tools to be accessed through a web browser in a easy and understandable user interface. For today I will attempt to go over galaxy tools in some detail.
A standard installation comes with many biomedical tools but of course as research is very diverse the standard set of tools may not be enough. Galaxy allows you to easily extend the tool list with their XML tool syntax. The parameters for the tool are specified in the Galaxy tool xml file and between the <command> tags your standard command line run is specified.
<tool name=”my_nobel_prize” version=”1”>
./nobel_prize $input_nobel > $output_nobel
<param name=”input_nobel” type=”data” format=”madeupdatatype”/>
<data name=”output_nobel” format=”madeupdatatype”/>
This imaginary tool will basically run the tool nobel_prize which takes one argument of madeupdatatype and outputs output_nobel which the user can view and download from galaxy. Galaxy tracks the format of data types and all relevant metadata making managing the diverse array of biomedical data formats manageable. Galaxy tools are as complicated as the command-line tool is to operate which means adding the tool into Galaxy is not always straightforward, don’t even get me started on interactive programs. Many tools require an external script usually written in bash or python which can perform post and pre processing of the data. But what makes it different from just running that tool from the command line apart from the obvious its easier to fill out a web form rather than use terminal (to which I would disagree anyway!).
The differences are in workflows, data sharing and histories. In this post I will briefly describe these things and go over them in detail in future posts. Histories are analyses in Galaxy that show all input, intermediate, and final datasets, as well as every step in the process and the settings used with each. Workflows enable plugging tools together in a series of steps this can be very useful for repeat analysis and does not require any scripting. Datasets, workflows and histories can also be shared in galaxy, this could make available a useful workflow that someone thinks would be useful for others performing similar experiments.
If you would like to learn more about Galaxy follow this link.