GoFigure and The Digital Fish Project: Open tools and open data for an imaging based approach to system biolgy

Alexandre Gouaillard1*,Titus Brown,Marianne Bronner-Fraser,Scott E. Fraser,Sean G. Megason
1.Singapore Agency for Science Technology and Research
Abstract

Abstract

As part of the Center of Excellence in Genomic Science at Caltech, we have initiated the Digital Fish Project. Our goal is to use in toto imaging of developing transgenic zebrafish embryos on a genomic scale to acquire digital, quantitative, cell-based, molecular data suitable for modelling the biological circuits that turn an egg into an embryo. In toto imaging uses confocal/2-photon microscopy to capture the entire volume of organs and eventually whole embryos at cellular resolution every few minutes in living specimens thoughout their development. The embryos are labelled such that nuclei are one color and cell membranes another color to allow cells to be segmented and tracked as they move and divide. The use of a transgenic marker in a third color allows a variety of molecular data to be marked. In toto imaging generates 4-d image sets (xyzt) which can contain 100,000 to 1,000,000 images per experiment. We are developing a software package called GoFigure to visualize, segment, and analyze these very large image sets. GoFigure uses a MySQL database back end for managing storage of images and segmented objects and uses VTK and ITK for visualization and segmentation. We plan to use in toto imaging to digitize the complete expression and subcellular localization patterns of thousands of proteins throughout zebrafish embryogenesis. This genomic data, our zebrafish lines, and GoFigure will be distributed following the Open Data/Open Source model.

Keywords

confocal imagingsystem biologyCell tracking
Manuscript
Source Code and Data

Source Code and Data

No source code files available for this publication.

Reviews

Reviews

Danielle Pace

Monday 10 September 2007

Summary:
An image-based approach to systems biology allows both cell lineage trees and quantitative molecular data to be gathered over time and at cellular resolution.  This paper describes GoFigure, a software package for the image processing of four-dimensional in toto images with the overall goal of extracting these two types of data.  In particular, the development of zebrafish embryos will be studied at the cellular level in order to discover more about the genetic circuits underlying embryogenesis.  Although still in progess, once completed this project will be an excellent contribution to the open source community in systems biology, as a very large collection of molecular data (the "Digital Fish") and GoFigure itself will be released in an open source fashion.

Hypothesis:
The overall hypothesis is that an image-based approach to systems biology yields accurate and reliable molecular and cell lineage data.  Although no hard evidence to support this is provided per se, the rationale behind their approach is extensively discussed and in my opinion makes a lot of sense.

Evidence:
1)  The advantages of an image-based approach to systems biology over more traditional molecular approaches are clear and convincing.  2) The authors claim that "computers have a notoriously difficult time (spotting cells in microscopic images)" but do not reference previous (presumably unsuccessful attempts) by themselves or others to solve this problem. 3) The authors claim that "we cannot predict how when and where a protein will be expressed, we cannot predict how a protein will fold and function once it is translated, and we cannot predict the interaction between expressed proteins that allow them to form functional genetic circuits" without mentioning previous bioinformatics work on subcellular localization, protein folding and reverse engineering of genetic networks.  Although a thorough overview of systems biology is obviously not necessary in this paper, the wording of this part of the introduction would likely make the newcomer falsely think that these problems have never been attempted before.

Open Science:
Although the authors mention that the GoFigure software package and their data will be released in an open-source manner, no code, code documentation, images or data are provided with this submission.

Reproducibility:
I could not reproduce the authors' work because no source code was provided.

Use of Open Source Software:
The authors use VTK and ITK for visualization and segmentation, as well as CMake and CPack.  The authors also mention their plans to use a KWStyle GUI and perhaps Qt in the future.

Open Source Contributions:
Once again, no source code is provided in this submission.

Code Quality:
Not applicable.

Applicability to other problems:
Once the authors develop a faster segmentation algorithm, it can presumably be used for segmentation in other large datasets.

Suggestions for future work:
Not sure if this will help, but work by Abolmaesumi and Sirouspour on segmentation in medical images (particularly segmentation of the prostate in ultrasound images) may apply here - their algorithm is fast because it requires no numerical optimization.  See "Segmentation of prostate contours from ultrasound images", Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing 2004, vol 3 pp 517-520, 2004; "Ultrasound image segmentation using an interacting multiple-model probabilistic data association filter", Proceedings of the SPIE, vol 5370 pp 484-493, 2004; and "An interacting multiple model probabilistic data association filter for cavity boundary extraction from ultrasound images", IEEE Transactions on Medical Imaging, vol 23(6) pp 772-782, 2004.

Requests for additional information from authors:
1) The authors do not mention how molecular data (such as that from YFPs) is extracted from the in toto images - this presumably involves segmentation as well as further quantification.  Has any of this work been done, or does it fall under future work? 2) Neither the segmentation algorithms nor the propagation algorithms used have been described in any detail - I am interested in what has been attempted so far.  3) How is the accuracy of the current segmentation algorithm quantified?  The paper says that it is "quite accurate" but I would like to know how this has been determined.  Also, the paper mentions that the current segmentation algorithm is slow - how much time does it take?  5)  Although the advantages of incorporating use interaction into GoFigure are clear, I wonder how this works in practice.  If the number of cells across space and time is as large as it seems to be, how much time does the user have to spend looking for false negatives and false positives?  6) More references on both in toto imaging and imaging for systems biology are needed so that the newcomer can see how this work fits into the rest of the field and how it is unique.

Additional Comments:
1) The writing / English of this paper should be improved, particularly in Section 3.  2) Overall this is an excellent project - I'm sure that plenty of systems biologists will be eager to get their hands on your software and data once it is released.  Great work so far!

Score justification:

1)  (2 points) Does the paper follow the standards of open-science by including the relevant code, data, and parameters needed to replicate the work described in the paper? Can the work be duplicated? How well does the work (when appropriate) build upon (instead of duplicating) existing open-source efforts?

0 stars (because no code or data was provided in this submission)

2)  (1 point) Does the work address a (knowledge or software-based) need within the community? Where do you see it having the most impact?

1 star - an open source software package for image-based systems biology is needed so that the difficult image processing work involved is not duplicated by researchers and so that biologists not explicitly trained in computer science can use the package and the associated data in their own work.

3)  (1 point) How well is the work described in the paper? Is sufficient background material provided in the paper itself, or easily located using pointers in the paper?

1 star - the work is adequately described, and the background material is especially detailed.

4)  (1 point) Can you name one or two research projects that you think would benefit from the use of this work?

1 star - the data provided by this work could definitely be used by bioinformaticians interested in reverse engineering of genetic networks, both to extract the underlying genetic circuits in zebrafish embryogenesis and as a test dataset to compare various reverse engineering algorithms.

Richard Beare

Monday 10 September 2007

Summary:
This paper is an overview of the Digital Fish Project. There is discussion of plans for automated segmentation work and mention of a framework to support the project, but minimal details. Some data has been made available in response to another reviewer.

Open Science:
At present there is nothing computational to review, although the paper states that code will be made public.

Additional Comments:

The project background is really interesting - it is a great example of how computational data analysis tools, such as image analysis, are changing the way in which some life sciences research is carried out. I'd recommend that the paper concentrate on this aspect, with emphasis on the scale and complexity of the problem. If suitable datasets can be made available with this paper then perhaps we will see part of the solution developed by the ITK community.

There is minimal technical content in the paper in its current form, with disussion limited to high level concepts - "tracks", "lineages" etc, and a couple of viewer screenshots. It reads like a marketing piece in places.

I'd like to see the tools become cross platform.

Stephen Aylward

Thursday 13 September 2007

Summary:
This paper is an excellent introductory text on the motivations for the field of systems biology and the process of in toto imaging as well as for the software architecture and GUI design for a system for processing 4D microscopy images.

Systems biology is concerned with imaging and quantifying the interactions between cell behavior and molecular circuits, particularly during embryogenesis.

In toto imaging in this test is concerned with 3D imaging (confocal microscopy or 2-photon fluorescence microscopy) to track tissue and organ development at the cellular and protein level over time.

The software and interface must support mosaicing, displaying, segmenting, and editing the segmentations from those massive 4D images.

The paper represents work-in-progress such that the software is not yet available.

Response from the author to a previous review shows that the data is available online at:
http://www.insight-journal.org/midas/view_collection.php?collectionid=37

(With Version 2 of this paper - authors now provide links to data, code, binaries, and more!!!)

Hypothesis:
Cellular segmentation in 4D images is a grand challenge in systems biology and the GoFigure software will help non-computer-scientists generate effective cell segmentations from massive datasets in a timely manner.

Evidence:
The initial implementation is somewhat limited but lessons were learned and will be addressed in the upcoming release.

Lessons learned include:

1) Hand editing tools are needed for when segmentations fail - and segmentations will fail.
The GoFigure software provides means for adding and deleting cell segmentations - it is surprising that they do not provide merging and splitting tools for when cell segmentations are inaccurate.

2) Large data can be effectively handled via tailored views - it is sufficient to provide 2D views that scroll over time or in 3D - there is no need to attempt complex graphical representations of 4D data.

3) Initial attempt at segmentation was done on 2D images - future versions will be in 3D and possibly in 4D; incorporating heuristics as well as higher dimensional information. 2D image segmentations were inaccurate and time consuming.

Open Science:
The paper is work-in-progress. The paper has not been updated to mention the availability of data, and the software remains unavailable.

The licensing terms of the software has not been identified. Additionally, it is limited to running on Windows at this time because of the use of MFC.

The authors are congratulated for making their data freely available.

Reproducibility:
Waiting for code release. Additionally, the paper is written at a higher level - more of an introductory text than a how-to text.

Use of Open Source Software:
Software uses ITK and VTK.


Open Source Contributions:
The license for the code is not given.

The data has been posted using a by-attribution license!

Applicability to other problems:
This software way be generally applicable when 4D data needs to be quantified. The use of ITK suggests that other segmentations could be incorporated into the software. Also, the backend mysql database for managing images and analysis results should be generally useful. This software could evolve into a very useful toolkit.

Suggestions for future work:
1) Release the source

2) Go cross-platform

3) Release more data

4) Ensure mysql can handle the large images (some groups prefer PGSQL for large images).

5) Provide demonstrations and how-to text

6) Fix the spelling, grammar, and punctuation in the paper, particularly in the second half of the text.

David Holmes

Saturday 1 September 2007

Summary:
The authors describe the Digital Fish project which involves the acquisition of in toto images of zebrafish and the development of the GoFigure software tools.  In toto imaging is detailed along with the purpose of the project. The authors state that the project is an open source project with both the images and data available to other researchers.


Evidence:
The authors provide a clear description of the methods and an introduction to the GoFigure software.  While the authors state that the project is Open Source, I was unable to find an licensing information about either the software or data.


Open Science:
The authors purport that the work is open source; however, I was unable to verify this.  The software


Reproducibility:
I was unable to evaluate the software or data.  There is no reference in the paper to how to obtain the software or data.  I was able to find the na-mic wiki page (http://wiki.na-mic.org/Wiki/index.php/NA-MIC_NCBC_Collaboration:3D+t_Cells_Lineage:GoFigure) for GoFigure.  The software is available, but only in binary form.  It sounds like the next version should be completely open.  I was unable to find the data.  There is a webpage on the CIT website, but it requires a username and password.


Use of Open Source Software:
The  next version of the software should be written with the various software tools provided by na-mic.


Open Source Contributions:
If completed and released, this will provide an excellent open platform for analyzing confocal images.  The data will be very useful for developing new processing algorithms as well.


Code Quality:
Unable to evaluate. 


Applicability to other problems:
As stated above, it will hopefully be applicable to other confocal imaging projects.


Suggestions for future work:
None


Requests for additional information from authors:
No information is currently provided on obtaining either the software or data.  There is also no licensing information.  The text would benefit, at a minimum, from the reference to the na-mic wiki page.


Additional Comments:
There are several typos in the text, particularly near the end.  I suggest that authors edit the text one more time.

Gaetan Lehmann

Friday 14 September 2007

Summary:

The authors describe the The Digital Fish Project, which involve the in vivo imaging of full Zebra fish embryos using a 2 photon confocal microscope, and the GoFigure software, used to segment the cells and the nuclei, track them over the time, visualize the acquired images, ...

Open Science:

The paper, the data and the software are licensed with an open source license.


Reproducibility:

GoFigure is currently usable only on the windows OS, an OS I'm not used to using. I haven't built the code. 

 Use of Open Source Software:

Yes. ITK, VTK and CMake are used. The authors are also planning to use KWWidgets.


Open Source Contributions:

Yes. The source code is fully available with subversion.


Code Quality:

The code seem well structured and documented (That's also the Ohoh point of view http://www.ohloh.net/projects/8464?p=GoFigure).


Applicability to other problems:

In toto imaging is one of the most interesting thing to study the embryogenesis. This software, as well as the biological methods used, will be very useful to study the embryogenesis in other species. We can also imagine using it to study the differenciation of the cells in some particular tissues.


Requests for additional information from authors:

What image format comes from your microscope? It is usualy a bit difficult to read proprietary format of the biggest confocal manufacturers, and if a new reader have been developed, in may be one other good thing to reuse from this project.

Confocal imaging usually have an extinction of the signal when the depth increase. Don't you have that phenomenon in your images? If yes, have you leave correct it? How?

You say the image are noisy. How are you denoising them?

Finall, you say that "the amount of the expression marker in each cell can then be quantitated". However, from my own experience, quantitation is very difficult in confocal images, because it is highly dependent of several factors (gain and offset of the detector, transparency of the object and/or the medium, depth extinction, photo blitching, ...). Can you say more about what you want to measure, and how you will do that ?

Additional Comments:

A very promising project - I will follow it closely!