Discussion on MIFlowCyt from ISAC
The comments below are from the closed ISAC form http://www.isac-net.org/component/option,com_mamboboard/Itemid,131/func,showcat/catid,8/ interspersed with suggested resolution.
Michael Goldberg
MIFlowCyt Specification - Request for Comments - 2007/08/13 19:52 This thread discusses the Content article: MIFlowCyt Specification - Request for Comments
- There has been some concern here about the number of items that are required (shall) vs. recommended (should). A number of researchers I have discussed this with have independently raised this issue. If an item is marked as required (shall), it seems that this would disallow providing a value of blank?.
- Correct, allowing for “blank” values would not really make sense with the “shall” items. Is there anything that (i) you believe is currently required while you believe that it is not needed to interpret an experiment, or (ii) will likely make no sense for some experiments?
- Any single experiment may be part of a larger project whose docuentation is maintained elsewhere, so requiring that information for each experiment may be onerous.
- I believe that this could be handled by referencing appropriate parts of the large experiments?
- Also, some concerns about describing the analysis (specifically, gating). I understand this is being addressed in the Gating-ML proposed spec. This MIFlowCyt specification requires the Data Analysis details which may not exist if the experiment was used only to acquire data and a separate package is to be used for analysis.
- True, in this case I believe that there is no need to describe the data acquisition itself as a MIFlowCyt-compliant experimental description?
- Would there be an issue with experiments (such as retrospective studies that analyze previously acquired data) being unable to use data acquired during the current era because of missing info required by MIFLowCyt?
- This is a good point; however, it should probably be addressed by (i) formats to store MIFlowCyt-compliant information and (ii) software to process MIFlowCyt-compliant information rather than MIFlowCyt specification itself. Being able to precess data that miss some of the information seems like a valid use case.
Michael Goldberg (colleague)
Re:MIFlowCyt Specification - Request for Comments - 2007/08/14 12:30 These comments are from a colleague, who said that the spec is very relevant to the work in which he's involved.
- Section 2.1.1.3.1. Taxonomy: "The terms should come from an appropriate standard such as the NCBI taxonomy database" [emphasis added]. The data standard might want to have a way to specify the taxonomy used for the experiment. It could allow the experiment to define extensions to the taxonomy, and relate local terms of use back to equivalent standard terms. This could help investigators search for and analyze experiments that used different but equivalent terms. It could also be helpful for setting up experiments, and for preparing and acquiring samples, because the experiment's taxonomy can provide guidance on what terms to use .
- I believe that the wording we chose allows for an extension of the taxonomy or for a proprietary taxonomy to be used. However, this is good to keep in mind for further formats specifications.
- The standard calls for a lot of information that might not change from sample to sample, or even from experiment to experiment. From an audit trail point of view, it would be nice if information that is not modified is accessed by reference.
- Agreed, MIFlowCyt only specifies the content, not the format of the information. Referencing instead of copying information is not prevented.
- All the information that has been copied will need to be compared in order to tell if it was actually changed. Minimizing unnecessary copying of information greatly reduces this burden and helps make systems more scalable. A simple way to access information by reference is through a URI (Uniform Resource Identifier) and a digital signature.
- Explicitly allowing referencing of other descriptions would seem to address this
- The standard should use the term URI instead of URL. A URL is one kind of URI.
- URL is mentioned twice in MIFlowCyt; while it definitely makes sense to change it in 1.9, we may keep URL 3.1 in the sentence “URL pointing to manufacturer web pages” as URI would be confusing there.
- Section 2.3. Fluorescence Reagents Description: There seems to be a gap between this section and the previous sections that describe the Sample / Specimen Materials. That is the expected (and possible) types of cells or other particles in the sample material, and their expected (and possible) characteristics that could be measured. Maybe it is because this is assumed common biological knowledge. However, it is key to getting from the goal of the experiment to the appropriate reagents. Also it defines the options available for the gating strategy.
- Section 4. Data Analysis Details: Maybe Section 4.4 Data Transformation Details should go before Section 4.2 Compensation Details, or at least before Gating. Compensation is a form of Data Transformation.
- Good point, we will rearrange that.
- Data Transformation also includes scaling and quantitation. The basic purpose of scaling and quantitation is to convert data from the units measured by the Optical Detector to units of the Characteristic Being Measured. Usually the process reverses the process of Sample Treatment and Data Acquisition. In other words, it will go from a Voltage to a count of Photons (Fluorescence Intensity) to a count of Reporter (Fluorochrome) molecules to a count of Detector molecules to a count of Analytes. In each step along the way the analysis should display the data scaled in the appropriate units so the graphical representation is valid as well as useful.
Another use of Data Transformation is for data visualization. New parameters can be computed from multiple characteristics. They can be used as new axes for graphic representations of the data.
- Section 4.3. Gating Details: There really are two phases of gating. First is getting to the populations of interest. Then there is identifying subpopulations and generating detailed statistics about them.
- I believe that this hierarchy may be even more “nested”, i.e., subpopulations of subpopulations ? and I believe that we support for these.
- Section 4.3.3. Gate Statistics: There might also be statistics where the denominator gate is not a containing (superset) population.
- JS: This is a good point to have in mind, however, I do not think that we say anything preventing this?
- RB: I think we say you always have to give a denominator, so this may be confusing
- JS: This is a good point to have in mind, however, I do not think that we say anything preventing this?
Jonni S. Moore
comments from computational colleague - 2007/08/14 11:24 I am sending comments from my colleague Wade Rogers, a computational biologist we have been working with on some of the issues of data standards and analysis. If anyone wants to speak to him directly, you may contact him at wade.rogers@ciradiscovery.com thanks, Jonni Moore
A standard to specify how information relating to flow cytometry experiments will be captured is a GOOD THING.
I like the general approach here, which is to first specify what information "shall" be included in a standard-conformant way, without specifying how (i.e. format).
- There are a couple of practical problems here. For example, in section 3.3 it is attempted to fully describe instrument configuration and settings. It is probably not possible to do so in a fundamentally meaningful way. For example, 3.3.3 specifies that details about excitation optics shall be provided, detailing all components along an excitation light path (unless light source power and polarization are specified at the intersection of light source and particles).
- What about beam profile at the intersection? This may be as/more important than polarization. Also, if I know all of the components along the light path, I still may not know the power/polarization/profile at the intersection. So, what is the intent here? Is it to be able to accurately reproduce the experiment, or simply to document instrument config? Do you actually monitor power, polarization and beam profile at the intersection on a frequent basis?
- Covered in 3.3.2.5.? The preferred way is to state details at the intersection point; however, it may not be feasible for researchers to do so. Detailing the light source characteristics at the laser output and describing the optical path enables at least for some kind of approximation of the light characteristics at the beam/sample intersection point. For example, the power is highly dependent on the number of optical surfaces (and their quality) that are involved in beam transmission.
- Similar comments e.g. on optical filters. To really know what a filter is doing (3.3.4.3) we'd need a calibration curve. Also, the angle of an interference filter relative to the beam significantly affects its spectral properties, but there's no mention of this. The practicalities are tough.
- Gating and Transformation seem as, or even more important than capturing the instrument configuration. The level of detail is much lower here in the standard - looks like they ran out of gas. There's much more detail in Gate-ML and Transformation-ML. Don't know if this should be included - perhaps not. Since this is a Minimum Information standard, I'd opt for simplifying the instrument part rather than elaborating the gate and transformation part.
- This only seems so as Gating-ML and Transformation-ML specify formats, not only content of the information. We still want to capture the full information about gating and transformations and we expect Gating-ML and Transformation-ML being possible ways to provide these details.
- I think the experiment overview part is mostly ok, as are sample and reagent description. One thing that seems missing from experiment is n. If it is a cross-sectional study we need to know (presumably in section 1.3) how many instances there are in each experimental group. If a longitudinal study, then how many and what time points for each sample. This needed to estimate statistical power. Then the can of worms is the statistical analysis methodology... You'd really need to capture all of this in some detail if the conclusions (1.7) are to be meaningful. Again, since this is MI I'd opt for a more minimalistic approach. Perhaps these details should be incorporated by reference (e.g. to a publication).
- This seems tricky to simplify this way and one or two numbers would probably not help as for the many different kinds if experiments. However, these “#s” should be revealable from further description of the experiment.
Robert C. Leif
Re:MIFlowCyt specification - 2007/06/25 14:35 The term, Parameter, is used multiple times in MIFlowCyt without being explicitly defined. Section 3.3.6. Optical Paths states " The parameter is understood as the type of measurement stored in list mode data files" This needs to be generalized. The definition in FCS 3.0 Section 2.2.9 is, “A parameter is the signal produced by one of the detectors of the cytometer. Forward scattering is typically one of the measurement parameters. A parameter value is a digital representation of a parameter.” This definition is too limited; since, it does not include mathematically transformed measurements such as log fluorescence or calculated measurements such as electronic opacity (radio-frequency impedance divided by DC impedance) or other normalized measurements, such as fluorescence divided by DC impedance or low angle scatter. The situation becomes even more complex if parameter is reused for digital microscopy, which employs multiple types of calculated values e.g. texture measurements, perimeter divided by area ratios, etc. If parameter is to describe the signal produced by the detector or a concatenation of the detector type, a sequential number and how the signal was processed e.g. FL1-W, then a caveat should be introduced that except for engineering studies, the parameter is inappropriate for as an axis for a graph. The use of FL1 instead of the material measured is poor writing. The biologist or clinician is interested in what is measured; not how the measurement was made. For the software engineers, this is known as information hiding. The subject of interest should be used in a presentation, as opposed to an indirect reference to the subject. The definition of parameter that I suggest is closer to the parameter value from FCS 3.0. “A parameter is a short description of the instrumentation that produced an item stored in a list mode file. Please note that except for engineering studies, the use of a parameter is inappropriate for the axis of a graph in a scientific publication or a clinical report. A description of the property detected e.g. CD4 or Anti-CD4 fluorescence, low angle or orthogonal light scatter, cell volume, etc. is appropriate.
- We will fix parameter definition.
Janet Siebert
Re:MIFlowCyt specification - 2007/08/03 11:21 General comment: This may be an implementation detail, but there are complex relationships amongst various categories of information that are not explicitly recognized in the standard. For example, information about the experimental design/overview probably applies to many samples; a particular sample may be divided into multiple aliquots, each with its own treatment. Additionally, certain aliquots and treatments may be replicates of each other. Furthermore, the treatment of a particular aliquot probably applies to aliquots from multiple samples. Essentially, a handful of aliquot treatments may be used throughout the experiment. Instrument details have their own set of relationships. Somehow, these relationships must be accounted for. Otherwise, the standard is taking us down the slippery slope of highly denormalized and duplicated data.
- Not sure if this needs/can be in the standard?
Some specific comments:
- 1.6 Date. Date/time period of what? Sample collection? Actual runs of the flow cytometer? Gating? Data analysis of experimental results? Months can ellapse between these different phases of the investigation.
- We will extend the description.
- 1.7. Conclusions. In complex experiments (e.g. 4 cohorts, 3 cell lineages, 6 cytokines, plus rich clinical readouts), data can be collected, but "a brief summary of the interpretation of the results" may not be available because (a) such analysis has not been performed; (b) data has been distributed to multiple groups to analyze the data in different ways; or (c) there simply is not a brief interpretation. "Conclusions" should be considered a nice-to-have as opposed to a must-have.
- we will fix that.
Peter Wilkinson
Re:MIFlowCyt specification - 2007/08/13 15:35 This is a suggestion by one of my colleagues Bastian Angermann that I also support.
- In section 4, there should be an additional sub-section for quantitative FCM, perhaps titled "4.5 Quantitated FCM, for experiments that have been designed for mapping fluorescent intensities to the number of molecules being measured. These types of experiments seem to be in the minority but I think they will become more popular, so the MIFlowcit should be specifying what the minimum information should be. We are working on the details of these types of experiments as these are new to us and can put forward a recommendation for what should be captured in September.
