Entering edit mode
Ann Hess
▴
340
@ann-hess-251
Last seen 10.2 years ago
I am experimenting with edgeR for high throughput (next gen) sequence
data and proteomics spectral count data and have a few questions.
1. Is it correct to think of the pseudocounts (pseudo.alt produced by
estimateCommonDisp) as normalized counts? According to the edgeR
vignette ?The pseudocounts are calculated using a quantile-to-quantile
method for the negative binomial so that the library sizes for the
pseudocounts are equal to the geometric mean of the original library
sizes.? For the data that I am working with, the column sums for
pseudo.alt are very close to the common.lib.size, but the boxplots do
not ?line-up?. Is this because the pseudocounts are ?generated under
the alternative hypothesis??
2. I noticed that within the estimatePs function, the minimum value
is set to 8.783496e-16. I think the choice of this minimum will
affect the estimated logConc and logFC values, but will it affect the
test results (p-values)?
3. The ranges for logConc and logFC seems different when comparing
the graph produced by smearPlot and output produced by exactTest (for
a single comparison). Specifically, for each of the examples in the
edgeR vignette (and in my own data examples), the minimum logConc in
the smearPlot is ~ -16, while in the table from topTags the minimum is
~32. For logFC, the max shown in smearPlot is ~10, while the max in
topTags is ~40. After changing xlim and ylim in plotSmear, this
doesn?t seem to be an issue of setting the axes.
I am using edgeR_1.4.7 with R version 2.10.1.
Thanks!
Ann