deleting probes in expression set

0

Entering edit mode

Vanessa Vermeirssen ▴ 40

@vanessa-vermeirssen-2253

Last seen 9.6 years ago

Hi, I have created eSets from RGLists for cDNA microarrays. I would like to combine in the end data from several different platforms. As a special case, I would like to combine 2 eSets with the same gene probes, but in a different order on the array (so 2 different array platforms). The IDs of my probes are not unique, so I cannot use them as FeatureNames...some have a duplicate in there (extension #2 after its name) and the control probes are not uniquely named e.g. luciferase (10 x). Is there a way to delete the duplicates or integrate their information in the original (taking the average)? How to delete the control probes? This would enable me to end up with unique IDs, so I could use them as feature names and then it is fairly easy to combine the two expression sets. In another stage, when combining from different platforms with different genes, I would like to extract just the information for a specific gene probe list. Is this possible? I am new to Bioconductor, but learning a lot every day... I hope that somebody can help me. Thanks so much already, Vanessa Vermeirssen -- ================================================================== Vanessa Vermeirssen, PhD Tel:+32 (0)9 331 38 23 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM vamei at psb.ugent.be http://www.psb.ugent.be

• 1.5k views

ADD COMMENT • link updated 16.6 years ago by Martin Morgan 25k • written 16.6 years ago by Vanessa Vermeirssen ▴ 40

0

Entering edit mode

Martin Morgan 25k

@martin-morgan-1513

Last seen 1 day ago

United States

Hi Vanessa -- Sounds like you want to 1) subset using character, numeric, or logical vectors to select or reorder; 2) have some way to access features as 'groups', e.g., because of duplicate probe set names. I'd encourage you to think carefully about part 2, as ExpressionSets are designed the way they are (unique featureNames) because this is what makes most biological and statistical sense for the type of data they are designed to represent. Some details: I think your second question is easier > In another stage, when combining from different platforms with > different genes, I would like to extract just the information for a > specific gene probe list. Is this possible? do you want > library(Biobase) > data(sample.ExpressionSet) > sample.ExpressionSet ExpressionSet (storageMode: lockedEnvironment) assayData: 500 features, 26 samples element names: exprs, se.exprs phenoData sampleNames: A, B, ..., Z (26 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (500 total) fvarLabels and fvarMetadata description: none experimentData: use 'experimentData(object)' Annotation: hgu95av2 > sample.ExpressionSet[c("AFFX-MurIL2_at", "31739_at"),] ExpressionSet (storageMode: lockedEnvironment) assayData: 2 features, 26 samples element names: exprs, se.exprs phenoData sampleNames: A, B, ..., Z (26 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-MurIL2_at, 31739_at fvarLabels and fvarMetadata description: none experimentData: use 'experimentData(object)' Annotation: hgu95av2 i.e., provide the vector of featureName as the first argument to subset? The first sounds more complicated, the following might get you going, but proceed with some thought! > I have created eSets from RGLists for cDNA microarrays. I would like > to combine in the end data from several different platforms. As a > special case, I would like to combine 2 eSets with the same gene > probes, but in a different order on the array (so 2 different array > platforms). 'combine' *might* help (see ?combine and class?eSet or ?"eSet-class") You could subset one of the sets using indicies (i.e., featureNames) of the other (this will reorder expression values to match the order in the subset), and then manipulate. > The IDs of my probes are not unique, so I cannot use them as > FeatureNames...some have a duplicate in there (extension #2 after > its name) and the control probes are not uniquely named > e.g. luciferase (10 x). Is there a way to delete the duplicates or > integrate their information in the original (taking the average)? I think first you want to clarify what you're doing here, and whether it has statistical & biological meaning. You can leave featureNames unspecificed, and they will then be provided for you. You might then add a column to featureData to keep track of which probes map to which (non-unique) identifiers (though how are you going to interpret multiple expresion values for the same identiifer?). Subsetting by these features then becomes more awkward, e.g., > obj <- sample.ExpressionSet > featureData(obj)[["my_ids"]] <- paste("id", seq(1, nrow(obj))) > qids=c("id 10", "id 100") > idx <- featureData(obj)[["my_ids"]] %in% qids > obj[idx,] ExpressionSet (storageMode: lockedEnvironment) assayData: 2 features, 26 samples element names: exprs, se.exprs phenoData sampleNames: A, B, ..., Z (26 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-BioDn-5_at, 31339_at fvarLabels and fvarMetadata description: my_ids: NA experimentData: use 'experimentData(object)' Annotation: hgu95av2 These types of operations would allow you to average or do other operations on feature names. > How to delete the control probes? This would enable me to end up > with unique IDs, so I could use them as feature names and then it is > fairly easy to combine the two expression sets. This is subsetting again, probably most easily done using a logical index along the lines of > not_ctrls <- !(featureData(obj)[["my_ids"]] %in% ctrl_ids) > obj[not_ctrls,] ExpressionSet (storageMode: lockedEnvironment) assayData: 498 features, 26 samples element names: exprs, se.exprs phenoData sampleNames: A, B, ..., Z (26 total) varLabels and varMetadata description: sex: Female/Male type: Case/Control score: Testing Score featureData featureNames: AFFX-MurIL2_at, AFFX-MurIL10_at, ..., 31739_at (498 total) fvarLabels and fvarMetadata description: my_ids: NA experimentData: use 'experimentData(object)' Annotation: hgu95av2 You can use similar ideas with other R objects, including the RGList of limma, and with basic structures like a matrix or data frame. Hope that helps, Martin Vanessa Vermeirssen <vanessa.vermeirssen at="" psb.ugent.be=""> writes: > Hi, > > How to delete the control probes? > This would enable me to end > up with unique IDs, so I could use them as feature names and then it is > fairly easy to combine the two expression sets. > > In another stage, when combining from different platforms with different > genes, I would like to extract just the information for a specific gene > probe list. Is this possible? > > I am new to Bioconductor, but learning a lot every day... I hope that > somebody can help me. > > Thanks so much already, > Vanessa Vermeirssen > > -- > ================================================================== > Vanessa Vermeirssen, PhD > > Tel:+32 (0)9 331 38 23 fax:+32 (0)9 3313809 > VIB Department of Plant Systems Biology, Ghent University > Technologiepark 927, 9052 Gent, BELGIUM > vamei at psb.ugent.be http://www.psb.ugent.be > > _______________________________________________ > Bioconductor mailing list > Bioconductor at stat.math.ethz.ch > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Martin Morgan Bioconductor / Computational Biology http://bioconductor.org

ADD COMMENT • link 16.6 years ago Martin Morgan 25k

Login before adding your answer.