Coercing matrix into expression set, for normalization of only subsets of miRNAs (affy miRNA3.0)

0

Entering edit mode

James W. MacDonald 65k

@james-w-macdonald-5106

Last seen 57 minutes ago

United States

Hi Dana, On Friday, October 25, 2013 5:14:36 PM, Dana Most wrote: > Hi Jim, > > Thank you for your response. > > I am trying to separate the mature and premature microRNAs because > they have different hybridization characteristics. I thought it would > be more 'statistically correct' to normalize them separately since > these two populations are different. If I don't intend to compare the > premature and mature microRNAs to each other, are there any > disadvantages of separating them? except for the obvious reason - that > there is no function available for easy use and I am hardly a programmer. What do you mean by 'different hybridization characteristics'? Hybridization to the array, or something biological? If you mean biologically different, that is not relevant to the normalization. The only thing that is relevant to the normalization is that the relative amount of transcript doesn't change for most miRNAs, and that the distribution of the data is similar between arrays. Any biological consideration is moot. If it weren't, the quantile normalization wouldn't be reasonable for mRNA arrays either. If you mean that the hybridization to the array is different, then in what way? All the probes are just 25-mers. I guess you could argue that the hairpin transcripts, being longer will hybridize differently, but that still doesn't matter. The only assumptions being made are distributional, so if one transcript tends to bind better or worse than another, it doesn't matter. Again, if it did, quantile normalization wouldn't be useful for mRNA arrays either. > > So there's no easy way to coerce a data matrix (or a subset of the > data) into an ExpressionFeatureSet structure to satisfy the rma function? > > Maybe I can convert the matrix into an ExpressionSet using Biobase, do > you know if there's a way to to convert the ExpressionSet into an > ExpressionFeatureSet? No, you can't go back. What you have to understand is that each miRNA has a certain number of probes that measure the expression of that miRNA. When you run rma(), you are summarizing the expression of all the probes that measure a given miRNA in order to come up with a single value for each miRNA. An ExpressionSet isn't designed to handle the data in an ExpressionFeatureSet. In addition, the summarization requires that you know _which_ of the probes belong to a given probeset (where the probeset gives an idea of the expression for a single miRNA). If you mess with the data in an ExpressionFeatureSet, you also have to change the pd.info package to reflect those changes. Long story short, I think you are worrying about something that is not worth worrying about. You can certainly try to do something to normalize these probes differently but in the end I doubt it will make a difference, and almost surely won't have been worth the time spent. Best, Jim > > > Thank you so much for your help! > Dana > > > > On Thu, Oct 24, 2013 at 9:07 AM, James W. MacDonald <jmacdon at="" uw.edu=""> <mailto:jmacdon at="" uw.edu="">> wrote: > > Hi Dana, > > On Wednesday, October 23, 2013 12:04:31 AM, Dana Most wrote: > > Dear All, > > How can I transform/coerce a gene expression matrix into an > expression set? > > I'm using affy miRNA 3.0 data and I would like to normalize > only a subset of > the samples (i have 4 groups of samples and would like to > choose 2) and only a > subset of microRNAs (I have mature and premature microRNAs and > they should not > be normalized together). > > It should look something like this: > affyExpressionFS <- read.celfiles(celFiles, > pkgname="pd.mirna.3.0") > data = exprs(affyExpressionFS) > data = data[1:1000,1:20] > > > Why do you think the first 1000 rows are useful here? Is this just > supposed to be an example? > > > exprsData = coerce data into expression set > > rma(exprsData) > > > You can't run rma on an ExpressionSet, as an ExpressionSet is > intended to contain summarized data. Instead you need to use an > ExpressionFeatureSet object (which is what you are getting your > matrix of data out of). > > That said, you will have to do some serious coding if you want to > accomplish this. Right now there is no easy way (that I know of - > Benilton might correct me here) to subset to a particular set of > probes. You can check out the oligo source from subversion and > make whatever changes you want. There is even a 'subset' argument > that is for future use that you could implement if you want. > > But this leads me to your original rationale for wanting to do > this, where you state that mature and precursor miRNAs should not > be normalized together. I am not sure why you would think this, > and I am pretty sure you are wrong. > > You could argue that the hairpin miRNAs are fundamentally > different from the mature miRNAs (which I suppose they are), but > that has nothing to do with normalization. For the normalization > to be reasonable, you have to fulfill two criteria. First, most > probes should not be differentially expressed between samples, and > second, the underlying distributions of the data should not be > completely dissimilar. > > This has nothing to do with what the probes are supposed to > measure, nor whether or not the probes are even measuring anything > at all. So I don't see any real reason to separate the hairpin > from mature miRNAs prior to normalizing. > > Best, > > Jim > > > > > > > Also, I would like to use array quality metrics package on > exprsData > > arrayQualityMetrics(__expressionset = exprsData, > outdir = "exprsData",force = FALSE, do.logtransform = FALSE) > > Thank you, > Dana > > _________________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> > https://stat.ethz.ch/mailman/__listinfo/bioconductor > <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> > Search the archives: > http://news.gmane.org/gmane.__science.biology.informatics.__conductor > <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> > > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099

miRNA Normalization GO Biobase affy convert oligo miRNA Normalization GO Biobase affy • 1.8k views

ADD COMMENT • link updated 10.5 years ago by Pekka Kohonen ▴ 190 • written 10.5 years ago by James W. MacDonald 65k

0

Entering edit mode

Pekka Kohonen ▴ 190

@pekka-kohonen-5862

Last seen 6.3 years ago

Sweden

Hello Dana, RMA is essentially a collection of normalization and data summation steps that can be, at least in theory, implemented separately and using matrices instead of ExpressionSet objects. Eset objects can then be constructed later. So if you feel that (for whatever reason) you should normalize pre-miRNA and mature-miRNA probes separately you can do that. But you would need to separate the the probes for each in the .cel file (by inspecting the annotation package). And then run quantile normalization on it and then run the medianpolish method to summarize probes to probe-sets. RMA (and quantile normalization) makes certain assumptions, for instance that transcripts do not change on the average much. since there are so few miRNAs, if one subset changes too much it might indeed makes sense to normalize without it (although this is unlikely). I think it would be best if you normalized the entire data set first with the default RMA protocol. Then you can extract the pre-miR and mature-miR probesets out of the data and see if the samples are unbalanced in either subset (dencities no longer overlap) using limma plotDensity(eSet, oneDevice=T, main=""). If they are unbalanced then you could re-run the limma normalizeQuantiles() on the mature-miRNA subset using the probeset intensities instead of the probe signals. This function accepts either matrices of ESet objects. eSet objects can be subsetted easily based on the features or you can use the matrix. But normalization protocols are well developed in general. I don't think it makes a lot of sense to mess with them, unless your sample is somehow peculiar. Also, in general, you can never compare the signal from one probe or probeset to another probe or probeset, except in terms of the relative change between one or more samples. Every probe has different hybridization characteristics so that one intensity, even if exactly identical, does not signify the same absolute mRNA level as another intensity. Your arguments would need to be based on distributional assumptions. BR, Pekka 2013/10/25 James W. MacDonald <jmacdon at="" uw.edu="">: > Hi Dana, > > > On Friday, October 25, 2013 5:14:36 PM, Dana Most wrote: >> >> Hi Jim, >> >> Thank you for your response. >> >> I am trying to separate the mature and premature microRNAs because >> they have different hybridization characteristics. I thought it would >> be more 'statistically correct' to normalize them separately since >> these two populations are different. If I don't intend to compare the >> premature and mature microRNAs to each other, are there any >> disadvantages of separating them? except for the obvious reason - that >> there is no function available for easy use and I am hardly a programmer. > > > What do you mean by 'different hybridization characteristics'? Hybridization > to the array, or something biological? > > If you mean biologically different, that is not relevant to the > normalization. The only thing that is relevant to the normalization is that > the relative amount of transcript doesn't change for most miRNAs, and that > the distribution of the data is similar between arrays. Any biological > consideration is moot. If it weren't, the quantile normalization wouldn't be > reasonable for mRNA arrays either. > > If you mean that the hybridization to the array is different, then in what > way? All the probes are just 25-mers. I guess you could argue that the > hairpin transcripts, being longer will hybridize differently, but that still > doesn't matter. The only assumptions being made are distributional, so if > one transcript tends to bind better or worse than another, it doesn't > matter. Again, if it did, quantile normalization wouldn't be useful for mRNA > arrays either. > > > >> >> So there's no easy way to coerce a data matrix (or a subset of the >> data) into an ExpressionFeatureSet structure to satisfy the rma function? >> >> Maybe I can convert the matrix into an ExpressionSet using Biobase, do >> you know if there's a way to to convert the ExpressionSet into an >> ExpressionFeatureSet? > > > No, you can't go back. What you have to understand is that each miRNA has a > certain number of probes that measure the expression of that miRNA. When you > run rma(), you are summarizing the expression of all the probes that measure > a given miRNA in order to come up with a single value for each miRNA. An > ExpressionSet isn't designed to handle the data in an ExpressionFeatureSet. > In addition, the summarization requires that you know _which_ of the probes > belong to a given probeset (where the probeset gives an idea of the > expression for a single miRNA). If you mess with the data in an > ExpressionFeatureSet, you also have to change the pd.info package to reflect > those changes. > > Long story short, I think you are worrying about something that is not worth > worrying about. You can certainly try to do something to normalize these > probes differently but in the end I doubt it will make a difference, and > almost surely won't have been worth the time spent. > > Best, > > Jim > > >> >> >> Thank you so much for your help! >> Dana >> >> >> >> On Thu, Oct 24, 2013 at 9:07 AM, James W. MacDonald <jmacdon at="" uw.edu="">> <mailto:jmacdon at="" uw.edu="">> wrote: >> >> Hi Dana, >> >> On Wednesday, October 23, 2013 12:04:31 AM, Dana Most wrote: >> >> Dear All, >> >> How can I transform/coerce a gene expression matrix into an >> expression set? >> >> I'm using affy miRNA 3.0 data and I would like to normalize >> only a subset of >> the samples (i have 4 groups of samples and would like to >> choose 2) and only a >> subset of microRNAs (I have mature and premature microRNAs and >> they should not >> be normalized together). >> >> It should look something like this: >> affyExpressionFS <- read.celfiles(celFiles, >> pkgname="pd.mirna.3.0") >> data = exprs(affyExpressionFS) >> data = data[1:1000,1:20] >> >> >> Why do you think the first 1000 rows are useful here? Is this just >> supposed to be an example? >> >> >> exprsData = coerce data into expression set >> >> rma(exprsData) >> >> >> You can't run rma on an ExpressionSet, as an ExpressionSet is >> intended to contain summarized data. Instead you need to use an >> ExpressionFeatureSet object (which is what you are getting your >> matrix of data out of). >> >> That said, you will have to do some serious coding if you want to >> accomplish this. Right now there is no easy way (that I know of - >> Benilton might correct me here) to subset to a particular set of >> probes. You can check out the oligo source from subversion and >> make whatever changes you want. There is even a 'subset' argument >> that is for future use that you could implement if you want. >> >> But this leads me to your original rationale for wanting to do >> this, where you state that mature and precursor miRNAs should not >> be normalized together. I am not sure why you would think this, >> and I am pretty sure you are wrong. >> >> You could argue that the hairpin miRNAs are fundamentally >> different from the mature miRNAs (which I suppose they are), but >> that has nothing to do with normalization. For the normalization >> to be reasonable, you have to fulfill two criteria. First, most >> probes should not be differentially expressed between samples, and >> second, the underlying distributions of the data should not be >> completely dissimilar. >> >> This has nothing to do with what the probes are supposed to >> measure, nor whether or not the probes are even measuring anything >> at all. So I don't see any real reason to separate the hairpin >> from mature miRNAs prior to normalizing. >> >> Best, >> >> Jim >> >> >> >> >> >> >> Also, I would like to use array quality metrics package on >> exprsData >> >> arrayQualityMetrics(__expressionset = exprsData, >> >> outdir = "exprsData",force = FALSE, do.logtransform = FALSE) >> >> Thank you, >> Dana >> >> _________________________________________________ >> Bioconductor mailing list >> Bioconductor at r-project.org <mailto:bioconductor at="" r-project.org=""> >> https://stat.ethz.ch/mailman/__listinfo/bioconductor >> >> <https: stat.ethz.ch="" mailman="" listinfo="" bioconductor=""> >> Search the archives: >> >> http://news.gmane.org/gmane.__science.biology.informatics.__conductor >> >> >> <http: news.gmane.org="" gmane.science.biology.informatics.conductor=""> >> >> >> -- >> James W. MacDonald, M.S. >> Biostatistician >> University of Washington >> Environmental and Occupational Health Sciences >> 4225 Roosevelt Way NE, # 100 >> Seattle WA 98105-6099 >> >> > > -- > James W. MacDonald, M.S. > Biostatistician > University of Washington > Environmental and Occupational Health Sciences > 4225 Roosevelt Way NE, # 100 > Seattle WA 98105-6099 > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor

ADD COMMENT • link 10.5 years ago Pekka Kohonen ▴ 190

Login before adding your answer.