selecting/filtering probesets from exprSet object prior to diff. exp. anal.
1
0
Entering edit mode
@mark-baumeister-4972
Last seen 9.6 years ago
Hi all, I am new to this list and have a question (below) related to - selecting/filtering probesets from exprSet object prior to diff. exp. anal. I'm also new to Bioconductor and am currently learning preprocessing of microarray data (i.e. raw CEL files from the Affymetrix UG-133A array) and then working with the normlized exprSet object to detect differential gene expression of tumor (ovarian) samples compared with normal samples. I am currently working with a set of ~33 tumor samples and ~7 normal samples. Because my machine is 32 bit and cannot handle that much memmory allocation, for the preprocessing I am using a program called RMAExpress to produce the normalized exprSet object. With the exprSet object (I am calling "eset") I am then using Bioconductor for the differential gene expression analysis. To start I have been creating a desgin matrix (as below) (which I name "design") for linear modeling steps I am using that come with the limma package. Normal Tumor T1 0 1 T2 0 1 T3 0 1 T5 0 1 T7 0 1 N1 1 0 T8 0 1 T9 0 1 T10 0 1 T11 0 1 N2 1 0 T12 0 1 T13 0 1 T14 0 1 T15 0 1 N3 1 0 and then I am using the following code to produce a linear model, a contrast matrix, and a list of differentially expressed genes. fit <- lmFit(eset, design) cont.matrix <- makeContrasts(NormalvsTumor=Tumor-Normal, levels=design) fit2 <- contrasts.fit(fit, cont.matrix) fit2 <- eBayes(fit2) topTable(fit2, number=100, adjust="BH") # use BH method My question is this, Is there a way to select or exclude ceratin probesets that I want or don't want to be included in the linear model before I produce the list (topTable) of differentially expressed genes? I have looked at the genefilter function but have not found specific examples of how to do what I want. Thanks in advance, -M -- Mark Baumeister http://sites.google.com/site/lfmmab/ [[alternative HTML version deleted]]
Preprocessing genefilter limma Preprocessing genefilter limma • 1.1k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen 9 hours ago
United States
Hi Mark, On 11/23/2011 1:00 PM, Mark Baumeister wrote: > Hi all, > > I am new to this list and have a question (below) related to - > selecting/filtering probesets from exprSet object prior to diff. exp. anal. > > I'm also new to Bioconductor and am currently learning preprocessing of > microarray data (i.e. raw CEL files from the Affymetrix UG-133A array) and > then working > with the normlized exprSet object to detect differential gene expression of > tumor > (ovarian) samples compared with normal samples. I am currently working > with a set > of ~33 tumor samples and ~7 normal samples. > > Because my machine is 32 bit and cannot handle that much memmory > allocation, > for the preprocessing I am using a program called RMAExpress to produce the > normalized exprSet object. With the exprSet object (I am calling "eset") I > am then using > Bioconductor for the differential gene expression analysis. > > To start I have been creating a desgin matrix (as below) > (which I name "design") for linear modeling steps I am using > that come with the limma package. > > Normal Tumor > T1 0 1 > T2 0 1 > T3 0 1 > T5 0 1 > T7 0 1 > N1 1 0 > T8 0 1 > T9 0 1 > T10 0 1 > T11 0 1 > N2 1 0 > T12 0 1 > T13 0 1 > T14 0 1 > T15 0 1 > N3 1 0 > > > > and then I am using the following code to produce a linear model, a > contrast matrix, > and a list of differentially expressed genes. > > > fit<- lmFit(eset, design) > cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal, levels=design) > fit2<- contrasts.fit(fit, cont.matrix) > fit2<- eBayes(fit2) > topTable(fit2, number=100, adjust="BH") # use BH method > > My question is this, > Is there a way to select or exclude ceratin probesets that I want or don't > want to be included in the > linear model before I produce the list (topTable) of differentially > expressed genes? There are ways to do this, but note that the eBayes() step above is estimating a prior for the probeset variance that uses all probesets on the array. If you selectively remove some probesets (say, all the low-variance probesets), you will be biasing the prior, which may have unintended effects. That said, both ExpressionSets and MArrayLM objects (the output from eBayes()) can be subset using the conventional square-bracket functions in R. So for example, you could remove the first ten probesets from your fit2 object thusly: fit2 <- fit2[-c(1:10),] or you could create an indicator of TRUE/FALSE, based on some metric ind <- fit2$p.value < 0.25 fit2 <- fit2[ind,] The same thing can be done to the ExpressionSet object as well. Best, Jim > > I have looked at the genefilter function but have not found specific > examples of how to do what I want. > > > Thanks in advance, > -M > > > > -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD COMMENT
0
Entering edit mode
Thanks a lot, James for your help. That seems pretty straightforward. That said, both ExpressionSets and MArrayLM objects (the output from eBayes()) can be subset using the conventional square-bracket functions in R. So for example, you could remove the first ten probesets from your fit2 object thusly: fit2 <- fit2[-c(1:10),] or you could create an indicator of TRUE/FALSE, based on some metric ind <- fit2$p.value < 0.25 fit2 <- fit2[ind,] The same thing can be done to the ExpressionSet object as well." If I know the probe ID's for the probes I want to select or exclude from the MArrayLM object (i.e. fit2) before producing the topTable() list, can I also use probe ID's somehow to select or exclude from the MArryaLM object? Mark On Wed, Nov 23, 2011 at 11:01 AM, James W. MacDonald <jmacdon@med.umich.edu>wrote: > Hi Mark, > > > On 11/23/2011 1:00 PM, Mark Baumeister wrote: > >> Hi all, >> >> I am new to this list and have a question (below) related to - >> selecting/filtering probesets from exprSet object prior to diff. exp. >> anal. >> >> I'm also new to Bioconductor and am currently learning preprocessing of >> microarray data (i.e. raw CEL files from the Affymetrix UG-133A array) and >> then working >> with the normlized exprSet object to detect differential gene expression >> of >> tumor >> (ovarian) samples compared with normal samples. I am currently working >> with a set >> of ~33 tumor samples and ~7 normal samples. >> >> Because my machine is 32 bit and cannot handle that much memmory >> allocation, >> for the preprocessing I am using a program called RMAExpress to produce >> the >> normalized exprSet object. With the exprSet object (I am calling "eset") >> I >> am then using >> Bioconductor for the differential gene expression analysis. >> >> To start I have been creating a desgin matrix (as below) >> (which I name "design") for linear modeling steps I am using >> that come with the limma package. >> >> Normal Tumor >> T1 0 1 >> T2 0 1 >> T3 0 1 >> T5 0 1 >> T7 0 1 >> N1 1 0 >> T8 0 1 >> T9 0 1 >> T10 0 1 >> T11 0 1 >> N2 1 0 >> T12 0 1 >> T13 0 1 >> T14 0 1 >> T15 0 1 >> N3 1 0 >> >> >> >> and then I am using the following code to produce a linear model, a >> contrast matrix, >> and a list of differentially expressed genes. >> >> >> fit<- lmFit(eset, design) >> cont.matrix<- makeContrasts(NormalvsTumor=**Tumor-Normal, levels=design) >> fit2<- contrasts.fit(fit, cont.matrix) >> fit2<- eBayes(fit2) >> topTable(fit2, number=100, adjust="BH") # use BH method >> >> My question is this, >> Is there a way to select or exclude ceratin probesets that I want or don't >> want to be included in the >> linear model before I produce the list (topTable) of differentially >> expressed genes? >> > > There are ways to do this, but note that the eBayes() step above is > estimating a prior for the probeset variance that uses all probesets on the > array. If you selectively remove some probesets (say, all the low- variance > probesets), you will be biasing the prior, which may have unintended > effects. > > That said, both ExpressionSets and MArrayLM objects (the output from > eBayes()) can be subset using the conventional square-bracket functions in > R. So for example, you could remove the first ten probesets from your fit2 > object thusly: > > fit2 <- fit2[-c(1:10),] > > or you could create an indicator of TRUE/FALSE, based on some metric > > ind <- fit2$p.value < 0.25 > > fit2 <- fit2[ind,] > > The same thing can be done to the ExpressionSet object as well. > > Best, > > Jim > > > > >> I have looked at the genefilter function but have not found specific >> examples of how to do what I want. >> >> >> Thanks in advance, >> -M >> >> >> >> >> > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 > > ************************************************************ > Electronic Mail is not secure, may not be read every day, and should not > be used for urgent or sensitive issues > -- Mark Baumeister http://sites.google.com/site/lfmmab/ [[alternative HTML version deleted]]
ADD REPLY
0
Entering edit mode
Hi Mark, On 11/23/2011 2:28 PM, Mark Baumeister wrote: > Thanks a lot, James for your help. > That seems pretty straightforward. > That said, both ExpressionSets and MArrayLM objects (the output from > eBayes()) can be subset using the conventional square-bracket > functions in R. So for example, you could remove the first ten > probesets from your fit2 object thusly: > fit2 <- fit2[-c(1:10),] > or you could create an indicator of TRUE/FALSE, based on some metric > ind <- fit2$p.value < 0.25 > fit2 <- fit2[ind,] > The same thing can be done to the ExpressionSet object as well." > If I know the probe ID's for the probes I want to select or exclude > from the MArrayLM object (i.e. fit2) before producing the topTable() list, > can I also use probe ID's somehow to select or exclude from the > MArryaLM object? Sure. Note that you can extract the probeset IDs from the ExpressionSet object using the featureNames() extractor, and then you could use either which() or %in% to create something that you could use to subset. Say you have a character vector called 'probes' with all the probeset IDs in it. ind <- featureNames(eset) %in% probes fit2[!ind,] Best, Jim > Mark > On Wed, Nov 23, 2011 at 11:01 AM, James W. MacDonald > <jmacdon at="" med.umich.edu="" <mailto:jmacdon="" at="" med.umich.edu="">> wrote: > > Hi Mark, > > > On 11/23/2011 1:00 PM, Mark Baumeister wrote: > > Hi all, > > I am new to this list and have a question (below) related to - > selecting/filtering probesets from exprSet object prior to > diff. exp. anal. > > I'm also new to Bioconductor and am currently learning > preprocessing of > microarray data (i.e. raw CEL files from the Affymetrix > UG-133A array) and > then working > with the normlized exprSet object to detect differential gene > expression of > tumor > (ovarian) samples compared with normal samples. I am > currently working > with a set > of ~33 tumor samples and ~7 normal samples. > > Because my machine is 32 bit and cannot handle that much memmory > allocation, > for the preprocessing I am using a program called RMAExpress > to produce the > normalized exprSet object. With the exprSet object (I am > calling "eset") I > am then using > Bioconductor for the differential gene expression analysis. > > To start I have been creating a desgin matrix (as below) > (which I name "design") for linear modeling steps I am using > that come with the limma package. > > Normal Tumor > T1 0 1 > T2 0 1 > T3 0 1 > T5 0 1 > T7 0 1 > N1 1 0 > T8 0 1 > T9 0 1 > T10 0 1 > T11 0 1 > N2 1 0 > T12 0 1 > T13 0 1 > T14 0 1 > T15 0 1 > N3 1 0 > > > > and then I am using the following code to produce a linear > model, a > contrast matrix, > and a list of differentially expressed genes. > > > fit<- lmFit(eset, design) > cont.matrix<- makeContrasts(NormalvsTumor=Tumor-Normal, > levels=design) > fit2<- contrasts.fit(fit, cont.matrix) > fit2<- eBayes(fit2) > topTable(fit2, number=100, adjust="BH") # use BH method > > My question is this, > Is there a way to select or exclude ceratin probesets that I > want or don't > want to be included in the > linear model before I produce the list (topTable) of > differentially > expressed genes? > > > There are ways to do this, but note that the eBayes() step above > is estimating a prior for the probeset variance that uses all > probesets on the array. If you selectively remove some probesets > (say, all the low-variance probesets), you will be biasing the > prior, which may have unintended effects. > > That said, both ExpressionSets and MArrayLM objects (the output > from eBayes()) can be subset using the conventional square- bracket > functions in R. So for example, you could remove the first ten > probesets from your fit2 object thusly: > > fit2 <- fit2[-c(1:10),] > > or you could create an indicator of TRUE/FALSE, based on some metric > > ind <- fit2$p.value < 0.25 > > fit2 <- fit2[ind,] > > The same thing can be done to the ExpressionSet object as well. > > Best, > > Jim > > > > > I have looked at the genefilter function but have not found > specific > examples of how to do what I want. > > > Thanks in advance, > -M > > > > > > -- > James W. MacDonald, M.S. > Biostatistician > Douglas Lab > University of Michigan > Department of Human Genetics > 5912 Buhl > 1241 E. Catherine St. > Ann Arbor MI 48109-5618 > 734-615-7826 <tel:734-615-7826> > > ********************************************************** > Electronic Mail is not secure, may not be read every day, and > should not be used for urgent or sensitive issues > > > > > -- > Mark Baumeister > > http://sites.google.com/site/lfmmab/ -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
ADD REPLY

Login before adding your answer.

Traffic: 926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6