FICCS Comments on FuGE v1(Mar07) Specification
Officially submitted comments
Most important comments have been summarized at PSI Wiki: http://psidev.info/index.php?q=wiki/FuGE This is a copy of our PSI Wiki post.
By Yu Qian and Richard Scheuermann on Mon, 2007-04-30 23:13
- It seems to us that FuGE is a data model designed to boost research in systems biology by specifying a consistent and standardized representation of data derived from heterogeneous sources and experiment types. FuGE attempts to strike a balance between interoperability and flexibility by proposing a fixed upper-level generic structure that can be extended to support different functional genomic methodologies by adding experimental domain-specific details at the lower-levels. In our evaluation of the potential use of FuGE to support a multi-experiment integrated database in the support of immunology research, we have noted three main drawbacks to the current FuGE model. Addressing these drawbacks would substantially improve the utility of FuGE for a broad cross-section of the biomedical research community. Effectively striking a balance between interoperability and flexibility by focusing on the upper-level generic structure is important and challenging. But in our view, FuGE leaves a little too much work to the domain experts in developing FuGE extensions, and that this situation could be improved by implementing more high-level classes or adding low-level classes under current high-level classes. For example, capturing all of the major experiment components in a single class Material is potentially problematic since it mixes together components that play quite distinct and important roles in the experiment. Specifically, the concepts of specimens, analytes, reagents and reporters are distinct components of the vast majority of biomedical experiments. Each of these components is required to describe the important connection between the independent and dependent variable of the experiment, since the substance being measured in the experiment result is rarely the analyte itself, but rather, some output from the reporter component of the reagent. Their relationships in making this connection are distinct. However, in the current FuGE model, when biologists want to describe reagent and analyte, which are common concepts in almost every experiment, they have to extend these concepts manually from the FuGE Material class. Since FuGE is designed to capture all common features of functional genomics experiments, encoding common concepts, like reagent and analyte, as general FuGE classes is necessary and would be very useful for domain-specific developers. In addition, in the current FuGE model, the extension of reagent and analyte is difficult because they could be related to both Material and Conceptual Molecule while FuGE does not allow multiple inheritances, which makes it more practically necessary to have predefined reagent classes if FuGE is to be widely adopted by biologists.
Second, a confusing thing to domain-specific developers is that different classes (or sub-classes) can be used to extend the same concept. For example, a data filtering can be extended either from FuGE::Action or FuGE::ProtocolApplication, and the result could be either FuGE::Data or FuGE::DataPartition. Although this provides flexibility and can be decided by domain experts, lack of a criterion for how to decide which branch to use could result in similar information being stored in different places, which would weaken the goal of using FuGE for data integration and interoperation among different communities. Finally, domain-specific developers can also be helped if they know how to merge existing FuGE extensions. Model merging is desirable in at least the following three cases: First, multiple developers are working on the same FuGE extension, and the sub-models developed individually need to be merged to form the final consensus extension. Second, developers may want to reuse and extend existing models, and people may want to share the models they have developed. Third, people may need a single model that can accommodate extensions from different communities. It would be helpful if the FuGE community could address how best to support these merging use cases. We would like to work with FuGE community together to resolve the above issues as they are closely related to the use cases in our database and the extension that we are making. We are currently extending FuGE to accommodate both metadata and derived data in flow cytometry (FCM) experiments, which will result in a FuGE-based FCM data model called FuGEFlow. We would like to follow up with FuGE community in FuGE refinement so that FuGEFlow and other FuGE-based extensions can be done right and bring practical benefit to all related communities. Please feel free to contact us. Best regards,
Yu Qian and Richard Scheuermann
- It seems to us that FuGE is a data model designed to boost research in systems biology by specifying a consistent and standardized representation of data derived from heterogeneous sources and experiment types. FuGE attempts to strike a balance between interoperability and flexibility by proposing a fixed upper-level generic structure that can be extended to support different functional genomic methodologies by adding experimental domain-specific details at the lower-levels. In our evaluation of the potential use of FuGE to support a multi-experiment integrated database in the support of immunology research, we have noted three main drawbacks to the current FuGE model. Addressing these drawbacks would substantially improve the utility of FuGE for a broad cross-section of the biomedical research community. Effectively striking a balance between interoperability and flexibility by focusing on the upper-level generic structure is important and challenging. But in our view, FuGE leaves a little too much work to the domain experts in developing FuGE extensions, and that this situation could be improved by implementing more high-level classes or adding low-level classes under current high-level classes. For example, capturing all of the major experiment components in a single class Material is potentially problematic since it mixes together components that play quite distinct and important roles in the experiment. Specifically, the concepts of specimens, analytes, reagents and reporters are distinct components of the vast majority of biomedical experiments. Each of these components is required to describe the important connection between the independent and dependent variable of the experiment, since the substance being measured in the experiment result is rarely the analyte itself, but rather, some output from the reporter component of the reagent. Their relationships in making this connection are distinct. However, in the current FuGE model, when biologists want to describe reagent and analyte, which are common concepts in almost every experiment, they have to extend these concepts manually from the FuGE Material class. Since FuGE is designed to capture all common features of functional genomics experiments, encoding common concepts, like reagent and analyte, as general FuGE classes is necessary and would be very useful for domain-specific developers. In addition, in the current FuGE model, the extension of reagent and analyte is difficult because they could be related to both Material and Conceptual Molecule while FuGE does not allow multiple inheritances, which makes it more practically necessary to have predefined reagent classes if FuGE is to be widely adopted by biologists.
By Josef Spidlen, Olga Tchuvatkina and Peter Wilkinson on Mon, 2007-05-07 16:19
- We share concerns raised by Yu Qian and Richard Scheuermann. We feel that providing concepts for specimens, reagents, and analytes would be of great value to most communities using FuGE.
- Specimen: material playing the sample role within an experiment; for use in testing, examination, or study.
- Reagent: chemical being used with specimen to involve some kind of response reaction; often used to detected/evaluate specimen properties.
- Analyte: the target that is being detected and reported by the Reagent.
- Investigation(Component) start and end attributes
- Explicit Title association with Description
- Explicit Keywords association with Description
Josef Spidlen, Olga Tchuvatkina, and Peter Wilkinson
- We share concerns raised by Yu Qian and Richard Scheuermann. We feel that providing concepts for specimens, reagents, and analytes would be of great value to most communities using FuGE.
Detailed development comments
Yu Qian (Max) - issue with merging existing FuGE extensions:
- An important issue for FuGE developers is how to merge existing FuGE extensions. Model merging is desirable in at least the following cases: First, multiple developers are working on the same FuGE extension, and the sub-models developed individually need to be merged to form the final extension. Second, developers may want to reuse existing models in their own models, and people may want to share the models they have developed. Third, people may need a model that can accommodate FuGE extensions in different communities. If FuGE can address how to support these mergings, it will be great news for developers.
Josef Spidlen - add what others found useful:
CISBAN DPI represents a FuGE M3-based system; they have added components to the object model, some of them could be useful for everyone and could be incorporated into FuGE OM itself, namely the integrated support for LSIDs and possibly a versioning system layered on top of the existing object model. Also, their need of pre-loading of workflows and protocols suggest potential performance issues with implementations of the FuGE model.
Olga Tchuvatkina
As far as I understand CISBAN DPI main addition to FuGE was a single class called Endurant.
The purpose of this calss is well descirbed in these presentation slides
Peter Wilkinson - entity colours in model
- It would make life easier if the abstract classes and associations were different in colour (and perhaps line thickness) in the model.
Peter Wilkinson - issues with Material / Sample Management (test txt edit):
The material class as adaptable as it is to the technology you want to model, seems a little problematic. I imagine that I would want to query my sample table to find out that parent of any aliqguot or the constituents of a new sample that was created from a pool. Since a material transformation is done via protocolApplication, the information about the material parents are part of the protocol application and not the material entity. If I was using the MDA approach to a system that I was building based on Fuge, perhaps I would even want one sample table (entity) that would be the input and output of a protocol application. In the same way as samples might be related by a parent relationship or pool, this can apply to any material as well. Perhaps it might also be a good idea to have a GenericSample. See below (2 alternatives, but I prefer the one on the right), I think that the parent (by association) and pool (by composition) should be added to the Material Class, and a new Generic Sample be added to the Bio.Material package, that includes AT LEAST an attribute for a LIMS ID, in addition to attributes for alternate accession numbers (without using Ontology).
I like Olga's point that these links could be added to the describable. I will reformulate this text propose that these paths be added to the describable entity as 'Pedigree' (the language Olga used) associations, that are association that are not created through a protocolApplication, and the 2 associations 'Parent' and 'Pool'.
Everything that is not flow cytometry specific in MIFlowCyt should be present in FuGE
Part 1: Investigation
My approach was to look at MIFlowCyt and find classes, attributes and association that can be reused among different Functional Genomics Experiments. I found two main categories of interest: Investigation and Material
Please take a look at Investigation object model draft created some time ago for MIFlowCyt project concept: MIFlowCyt_ObjectModel_Investigation
Most of what we have in MIFlowCyt is already present in FuGE, except for these attributes/associations:
Project (Investigation class in FuGE): title, start date, end date
Key word (Description class in FuGE): link to MeSH term
Bibliographic Reference (Bibliographic Reference class in FuGE): link to PubMed
Ryan Brinkman: I don't know if FuGE should suggest a specific instance of type of manuscript identifier, as there are several (many?) alternatives. This is a should in MIFlowCyt in any case so perhaps not a critical term. This likely also applies to MeSH. |
Olga Tchuvatkina: What if we call association "PrimaryManuscriptURI" or something like this? Explicitly having this association vs inherited URI link may make it more clear to FuGE users that URI is expected. The same may be true for MeSH term - having "PrimaryOntologyTerm" association implies that keyword should have ontology term associated with it (MeSH or not MeSH) |
Below is an example of how FuGE can be updated to include this information:
Created class: Keyword extended from Description
Created attributes: Investigation.startDate, Investigation.endDate
Created associations: Investigation.Title, Investigation.Keyword, Keyword.MeSHTerm, BibliographicReference.PubMedReference
Part 2: Materials
As you can see from object model draft, there are many things that can be reused:MIFlowCyt_ObjectModel_Materials
All FuGE experiments deal with regents and biological samples and details about their taxonomy and other sample characteristics. Please feel free to post here you take on how this complex objects can be described in FuGE. My take on it:
Created classes: Reagent, BiologicalSample, AnatomicalSample, CellLine
Created attributes: Reagent.catalogNumber, Reagent.lotNumber
Created associations: BiologicalSample.BiologicalSource, BiologicalSample.TaxonomicQualifier, BiologicalSample.GenotypeCharacteristic, BiologicalSample.PhenotypeCharacteristic, AnatomicalSample.Organ, AnatomicalSample.Tissue, Reagent.Manufacturer, Reagent.Components
Part 3: Software
The only missing property i found is information about software platform/operating system
Created attributes: Software.platform
I decided on attribute but platform property also can be expressed as association with Description or Ontology term
Note: multiplicities and directionality on FuGE Software-Provider association look strange to me
MagicDraw project
Please download from here: FuGE-V1-Candidate_oltchuva_1_1.mdzip
