Hello,
I have a dataset with very high coverage on each sample for which I want to run edgeR on to get the differentially expressed transcripts between 2 types of cells with 3 replicates each.
The analysis runs smoothly and I have many DE transcripts, but when I plot the BCV I get a plot (which is slightly weird with the trended dispersion line being bumpy - see attached) and the warning:
Warning message:
In sqrt(y$trended.dispersion) : NaNs produced
When I look into it, it turns out I have many negative values for the trended dispersion (for 160 transcripts) but the actual counts for these transcripts don't look unusual and I can't tell what is wrong.
My code looks like this:
counts_data <- DGEList(counts=round(all_samples[,2:7]),group=rep(c("PC","GS"), each=3), genes=my_genes) counts_cpm <- cpm(counts_data) filter_cpm <- rowSums(counts_cpm> 1)>= min(table(group)) counts_data <- counts_data[filter_cpm,] counts_data$samples$lib.size <- colSums(counts_data$counts) counts_cpm <- cpm(counts_data) counts_data_norm <- calcNormFactors(counts_data) set_d <- estimateCommonDisp(counts_data_norm, verbose=T) set_d <- estimateTrendedDisp(set_d) set_d <- estimateTagwiseDisp(set_d)
Some of the transcripts with the negative trended dispersion look like this:
set_d[set_d$trended.dispersion<0]
An object of class "DGEList"
$counts
PC1_EstimatedNumReads PC2_EstimatedNumReads PC3_EstimatedNumReads GS1_EstimatedNumReads GS2_EstimatedNumReads GS3_EstimatedNumReads
ENST00000196551 178336 170990 162887 100756 77829 81842
ENST00000202773 164480 154960 155601 102093 84593 89253
ENST00000216181 247587 287651 278037 285772 258325 262454
ENST00000216281 829577 717654 834776 504985 388358 412351
ENST00000216962 66190 73460 60030 165816 152397 175372
155 more rows ...
$samples
group lib.size norm.factors
PC1_EstimatedNumReads PC 2.17e+08 0.982
PC2_EstimatedNumReads PC 2.21e+08 0.996
PC3_EstimatedNumReads PC 2.18e+08 0.986
GS1_EstimatedNumReads GS 2.18e+08 1.031
GS2_EstimatedNumReads GS 1.89e+08 1.013
GS3_EstimatedNumReads GS 1.97e+08 0.993
$genes
Transcript Length Geneid Transcript_name
359 ENST00000196551 740 RPS5 RPS5-201
401 ENST00000202773 947 RPL6 RPL6-001
585 ENST00000216181 7501 MYH9 MYH9-001
607 ENST00000216281 3379 HSP90AA1 HSP90AA1-002
679 ENST00000216962 4134 PYGB PYGB-001
155 more rows ...
$common.dispersion
[1] 0.0511
$pseudo.counts
PC1_EstimatedNumReads PC2_EstimatedNumReads PC3_EstimatedNumReads GS1_EstimatedNumReads GS2_EstimatedNumReads GS3_EstimatedNumReads
ENST00000196551 175182 162868 159302 94074 85077 87631
ENST00000202773 161571 147600 152177 95323 92471 95566
ENST00000216181 243208 273988 271918 266821 282381 281018
ENST00000216281 814904 683566 816405 471496 424523 441517
ENST00000216962 65019 69971 58709 154820 166589 187776
155 more rows ...
$pseudo.lib.size
[1] 2.1e+08
$AveLogCPM
[1] 9.25 9.21 10.33 11.50 9.12
155 more elements ...
$trended.dispersion
[1] -0.00180 -0.00159 -0.00782 -0.01435 -0.00112
155 more elements ...
$prior.n
[1] 2.5
$tagwise.dispersion
[1] 0.00601 0.00559 0.00604 0.00720 0.00785
155 more elements ...
How could it be that I am getting negative values for the trended dispersion (especially when the respective tagwise dispersions are small but positive). Is this behaviour alarming or is it OK to be using the results from the analysis? Its probably worth mentioning that using a cpm>5 cutoff for the filtering reduces the problem ( I only get 5 negative trended dispersion values) but does not get rid of it. Increasing it more also doesn't help.
Any help would be very much appreciated!
Best wishes,
Emmanouela
Session Info:
R version 3.1.1 (2014-07-10)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_GB.ISO-8859-1 LC_NUMERIC=C LC_TIME=en_GB.ISO-8859-1 LC_COLLATE=en_GB.ISO-8859-1
[5] LC_MONETARY=en_GB.ISO-8859-1 LC_MESSAGES=en_GB.ISO-8859-1 LC_PAPER=en_GB.ISO-8859-1 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.ISO-8859-1 LC_IDENTIFICATION=C
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.8.5 limma_3.22.1
loaded via a namespace (and not attached):
[1] tools_3.1.1
You're a version out of date. See if the problem shows up in the latest version of
edgeR
(3.10.2) and Bioconductor (3.2). You'll also have to update R to version 3.2.0 or higher.