baySeq plotPosteriors or plotMA question

0

Entering edit mode

Smith, Hilary A ▴ 40

@smith-hilary-a-4937

Last seen 9.6 years ago

Hello, I'm a graduate student and just starting to analyze my first set of RNA-Seq data (so learning R/Bioconductor). In using the baySeq program, I am slightly confused about how to color-code the significant values in the plotPosteriors (and plotMA) functions. The vignette includes the example below to color-code differentially expressed genes as red. > plotPosteriors(CDPost.NBML, group = 2, samplesA = 1:5, samplesB = 6:10, col = c(rep("red", 100), rep("black", 900))) However, I believe the use of 100 for the red (and 900 for the black / not differentially expressed) is specific to their simulated data set. Is this correct, and if so, how can I make the function more general to plot the differentially expressed genes for any data set? Thank you, Hilary

• 1.0k views

ADD COMMENT • link 12.5 years ago • updated 12.4 years ago Smith, Hilary A ▴ 40

0

Entering edit mode

Vincent J. Carey, Jr. 6.7k

@vincent-j-carey-jr-4

Last seen 6 weeks ago

United States

On Wed, Nov 2, 2011 at 1:17 PM, Smith, Hilary A <hilary.smith@gatech.edu>wrote: > Hello, > I'm a graduate student and just starting to analyze my first set of > RNA-Seq data (so learning R/Bioconductor). In using the baySeq program, I > am slightly confused about how to color-code the significant values in the > plotPosteriors (and plotMA) functions. > > The vignette includes the example below to color-code differentially > expressed genes as red. > > plotPosteriors(CDPost.NBML, group = 2, samplesA = 1:5, samplesB = 6:10, > col = c(rep("red", 100), rep("black", 900))) > > However, I believe the use of 100 for the red (and 900 for the black / not > differentially expressed) is specific to their simulated data set. Is this > correct, and if so, how can I make the function more general to plot the > differentially expressed genes for any data set? > > Your belief is correct. The data in the vignette were set up so that the first 100 rows were differentially abundant between two groups of libraries. The objective of making the function "more general" as you pose is somewhat complicated. You could add a threshold parameter and have plot render the entities differently depending on the relationship of the posterior likelihood to the threshold. That is not really making it more general, but it is adding features and complexity. You might want to consider how to embellish the plot post hoc with information concerning decisionmaking about differential abundance. To get a sense for issues involved in choosing or conveying the effect of choosing a threshold, just plot plot(CDPost.NBML@posteriors[,2]) and consider the sensitivity and specificity of any threshold given what we know about the simulation. > Thank you, > Hilary > > _______________________________________________ > Bioconductor mailing list > Bioconductor@r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: > http://news.gmane.org/gmane.science.biology.informatics.conductor > [[alternative HTML version deleted]]

ADD COMMENT • link 12.5 years ago Vincent J. Carey, Jr. 6.7k

0

Entering edit mode

Smith, Hilary A ▴ 40

@smith-hilary-a-4937

Last seen 9.6 years ago

Hello, I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using baySeq, edgeR, and DESeq are yielding different answers; specifically DESeq and baySeq find different subsets of the genes found by edgeR. In trying to isolate the discrepancy, I've been trying to make items like normalization procedures similar to see if that improves congruency, or if the differences merely stem from how the pairwise tests are run and use of bayesian vs. exact-type statistics. I saw that baySeq's function "getLibsizes" can use the edgeR implementation of TMM, but when I try to do this I get an error message about a quantile argument not being used. This error appears whether or not I specify a quantile, and I'm further confused because the edgeR program itself does not require specifying quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so I think the problem is in the implementation of baySeq; perhaps I'm misunderstanding/coding something? Any help is greatly appreciated; commands excerpted from an R session are below. > library(baySeq) Attaching package: 'baySeq' The following object(s) are masked from 'package:base': rbind > library(snow) > cl = makeCluster(4, "SOCK") > library(edgeR) > simData = read.delim(file="2011.11.03counts.txt", header=TRUE) > rownames(simData)=simData$CompID > simData=simData[,-1] > simData=as.matrix(simData) > head(simData) X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F comp0 1065 1159 1207 1572 1477 1817 1841 605 1915 1113 1645 comp1 544 534 341 675 333 739 690 236 502 451 571 comp10 30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575 comp100 1060 1065 999 1332 918 1620 1697 658 1117 861 1336 comp1000 130 157 229 266 141 247 263 135 182 188 168 comp10000 35 14 15 37 10 47 28 17 22 21 12 X3P_R comp0 1732 comp1 799 comp10 51243 comp100 1370 comp1000 244 comp10000 64 > replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R") > groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2)) > cD = new("countData", data = simData, replicates = replicates, groups=groups) > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") Calculating library sizes from column totals. Error in calcNormFactors(d, quantile = quantile, ...) : unused argument(s) (quantile = quantile) > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM") Error in match.arg(estimationType) : 'arg' should be one of "quantile", "total", "edgeR" > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75) Calculating library sizes from column totals. Error in calcNormFactors(d, quantile = quantile, ...) : unused argument(s) (quantile = quantile) > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR") Calculating library sizes from column totals. Error in calcNormFactors(d, quantile = quantile, ...) : unused argument(s) (quantile = quantile) > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75)) Error in match.arg(estimationType) : 'arg' must be of length 1 > calcNormFactors(cD) Error in calcNormFactors(cD) : calcNormFactors() only operates on 'matrix' and 'DGEList' objects > calcNormFactors(simData) X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707 X2P_F X2P_R X3P_F X3P_R 1.0550414 0.8955669 1.0869486 1.1052472 > cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") Calculating library sizes from column totals. Error in calcNormFactors(d, quantile = quantile, ...) : unused argument(s) (quantile = quantile) > cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") Calculating library sizes from column totals. Error in calcNormFactors(d, quantile = quantile, ...) : unused argument(s) (quantile = quantile)

ADD COMMENT • link 12.4 years ago Smith, Hilary A ▴ 40

0

Entering edit mode

Dear Hilary not a reply to your question on normalisation function parameters, but to the underlying one about method comparison. You could try these things: - What happens if you simply compare biological replicates ? In such cases you want that the fraction of genes gettting an (unadjusted) p-value of, say, 5%, is less or equal to 5%. - Try a pairs() plot of the logarithms of the p-value values from each method, to see whether it is simply a cutoff effect. - Plot the detection call as a function of average count or of (within group) standard deviation, to see whether any biases are correlated with these. Best wishes Wolfgang On 11/17/11 4:07 PM, Smith, Hilary A wrote: > Hello, > I'm working on a couple analyses (currently pairwise) for 3'-DGE. Using baySeq, edgeR, and DESeq are yielding different answers; specifically DESeq and baySeq find different subsets of the genes found by edgeR. In trying to isolate the discrepancy, I've been trying to make items like normalization procedures similar to see if that improves congruency, or if the differences merely stem from how the pairwise tests are run and use of bayesian vs. exact-type statistics. I saw that baySeq's function "getLibsizes" can use the edgeR implementation of TMM, but when I try to do this I get an error message about a quantile argument not being used. This error appears whether or not I specify a quantile, and I'm further confused because the edgeR program itself does not require specifying quantiles for its TMM-based calcNormFactors. EdgeR seems to run fine so I think the problem is in the implementation of baySeq; perhaps I'm misunderstanding/coding something? Any help is greatly appre c! > iated; commands excerpted from an R session are below. > > >> library(baySeq) > > Attaching package: 'baySeq' > > The following object(s) are masked from 'package:base': > > rbind > >> library(snow) >> cl = makeCluster(4, "SOCK") >> library(edgeR) >> simData = read.delim(file="2011.11.03counts.txt", header=TRUE) >> rownames(simData)=simData$CompID >> simData=simData[,-1] >> simData=as.matrix(simData) >> head(simData) > X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R X2P_F X2P_R X3P_F > comp0 1065 1159 1207 1572 1477 1817 1841 605 1915 1113 1645 > comp1 544 534 341 675 333 739 690 236 502 451 571 > comp10 30423 37677 28044 54466 23961 58271 53852 34712 59300 40312 44575 > comp100 1060 1065 999 1332 918 1620 1697 658 1117 861 1336 > comp1000 130 157 229 266 141 247 263 135 182 188 168 > comp10000 35 14 15 37 10 47 28 17 22 21 12 > X3P_R > comp0 1732 > comp1 799 > comp10 51243 > comp100 1370 > comp1000 244 > comp10000 64 >> replicates = c("F", "R", "F", "R", "F", "R", "F", "R", "F", "R", "F", "R") >> groups = list(NDE = c(1,1,1,1,1,1,1,1,1,1,1,1), DE = c(1,2,1,2,1,2,1,2,1,2,1,2)) >> cD = new("countData", data = simData, replicates = replicates, groups=groups) >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") > Calculating library sizes from column totals. > Error in calcNormFactors(d, quantile = quantile, ...) : > unused argument(s) (quantile = quantile) >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="TMM") > Error in match.arg(estimationType) : > 'arg' should be one of "quantile", "total", "edgeR" >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR", quantile=0.75) > Calculating library sizes from column totals. > Error in calcNormFactors(d, quantile = quantile, ...) : > unused argument(s) (quantile = quantile) >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, quantile=0.75, estimationType="edgeR") > Calculating library sizes from column totals. > Error in calcNormFactors(d, quantile = quantile, ...) : > unused argument(s) (quantile = quantile) >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType=c("edgeR", quantile=0.75)) > Error in match.arg(estimationType) : 'arg' must be of length 1 >> calcNormFactors(cD) > Error in calcNormFactors(cD) : > calcNormFactors() only operates on 'matrix' and 'DGEList' objects >> calcNormFactors(simData) > X1E_F X1E_R X2E_F X2E_R X3E_F X3E_R X1P_F X1P_R > 1.0353157 0.9529524 0.9868063 1.1068479 1.0054938 1.0218195 0.9600905 0.8287707 > X2P_F X2P_R X3P_F X3P_R > 1.0550414 0.8955669 1.0869486 1.1052472 >> cD at libsizes = getLibsizes(cD, data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") > Calculating library sizes from column totals. > Error in calcNormFactors(d, quantile = quantile, ...) : > unused argument(s) (quantile = quantile) >> cD at libsizes = getLibsizes(data=simData, replicates=replicates, subset=NULL, estimationType="edgeR") > Calculating library sizes from column totals. > Error in calcNormFactors(d, quantile = quantile, ...) : > unused argument(s) (quantile = quantile) > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber

ADD REPLY • link 12.4 years ago Wolfgang Huber ★ 13k

Login before adding your answer.