DESeq2 : DE genes witth high log2FC and few or null counts in some samples
1
0
Entering edit mode
cagenet34 ▴ 20
@cagenet34-10910
Last seen 2.6 years ago
Toulouse, France, INRA

Hi all,

Hi, I am using DESeq2 to get differential expression between two genotypes A (sensitive) and B (resistant) and two conditions (virus-infected ; versus control). I’m using the formula given by Mike in DESeq2 vignette "Example 2: two conditions, two genotypes, with an interaction term” which fit adequately to my design. However, I have the same problem as others i.e. that my tops DE genes are genes having few reads in one condition and 0 in the other one for one genotype (see below genes G93001 or G13001 ).

I tried several  options (see below ) but was not able to discard these genes from my top DE. Any help will be greatly appreciate.

Thanks in advance,

Carine

Options tried without success

#betaPrior= FALSE with or without prefiltrering

dds <- DESeq(dds, betaPrior = FALSE)

#minimal prefiltrering and independent filtering = TRUE

dds1 <-dds[rowSums(counts(dds)>1) >=6]

finB57<-results(dds, contrast=c("condition","VI","CTRL"), independentFiltering = TRUE, pAdjustMethod = "BH")

#filtering on log2FC

 

Below is my code and my tops DE genes

######### DESeq2 analysis ######
antifin<-read.csv(file="Antivicfin.csv", sep = ";",header=TRUE,row.names=1 )
dim(antifin)
[1] 46585    12
library(DESeq2)
coldatafile<-read.table(file="colDataFin.txt",sep="\t",header=TRUE, row.names=1)
colnames(antifin) <- NULL
dds<-DESeqDataSetFromMatrix(countData = antifin,
+                             colData = coldatafile,
+                             design = ~ genotype + condition + genotype:condition)
dds$condition<- relevel(dds$condition, ref ="CTRL")
dds$genotype<- relevel(dds$genotype, ref ="B")
dds$genotype
 [1] B B B B B B A A A A A A
Levels: B A
dds1 <-dds[rowSums(counts(dds)>1) >=6] ##  minimal pre-filtering
dim(dds1)
[1] 39148    12
###########################################################
#Differential analysis
##########################################################
dds3 <- DESeq(dds1)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
resultsNames(dds3)
[1] "Intercept"    genotype_A_vs_B"  "condition_VI_vs_CTRL"  "genotypeA.conditionVI"
####################################################
# EFFET DU VIRUS SUR B
#####################################################
# the condition effect for genotype B (the main effect)
finB<-results(dds3, contrast=c("condition","VI","CTRL"), independentFiltering = TRUE, pAdjustMethod = "BH")
finB<-finB[order(finB$padj),] #tri par pvalue
finB_tri<-subset(finB,padj<0.01) # genes avec padj <0.01
dim(finB_tri)
[1] 1434    6
FC_finB<-subset(finB_tri, abs(log2FoldChange)>2)
comptb<-as.data.frame(counts(dds3))
colnames(comptb)<-paste("brut.",rownames(coldatafile),sep="")
comptn<-as.data.frame(counts(dds3,norm=TRUE))
colnames(comptn)<-paste("norm.",rownames(coldatafile),sep="")
compt<-merge(comptb,comptn,by.x=0,by.y=0)
rownames(compt)<-compt$Row.names
compt<-compt[,-1]#
.zz <-merge(compt,as.data.frame(FC_finB),by.x=0,by.y=0)
.zz
           Row.names brut.BC1 brut.BC2 brut.BC3 brut.BVa brut.BVb brut.BVc brut.ACa brut.A22Cb brut.A22Cc brut.A22Va
1  G93001         173           0          0          0          0          0        579        692        466       2279
2  G03001         246          91        317        116         52         16        376        363        118       3947
3  G45001          61          49         71         23         30         22         40         19         41        104
4  G49001          14           6          2          1          1          0         41         35         29        201
5  G53001          10           1          1          0          0          0         45         63         76        146
6  G51001         278           0          2          0          0          0       1000       1021       1144       2845
7  G57001          52           4         12          3          6          3        110        187        206        280
8  G10001          50          29         89         31         29         14         32         18          9        121
9  G13001         175           0          8          0          0          0        882        800        879       2838
10 G63001          53           1          0          0          0          0        221        195        195        485
11 G65001          74          59         65         45         14         10         23         17         28        198
12 G07001         155           0          0          0          0          0        549        588        475       1950
13 G66001         244           3          0          1          3          3       1203       1005        836       2466
14 G48001          18           2          1          0          3          1         58         62         58        241
15 G54001         209          56        138         69         55         34        359        627        145       2391
16 G37001        2563        1706       3620       1461       1460        502        632        385        413       2398
17 G53001        1656        1162       1973        959        765        300        689        640        342       3152
18 G17001          67          29         66         24         12          8         15         17         29         95
   brut.AVb brut.AVc norm.BC1 norm.BC2  norm.BC3   norm.BVa  norm.BVb  norm.BVc norm.A22Ca norm.A22Cb norm.A22Cc
1        2119       1570   179.79746    0.000000    0.000000    0.0000000   0.0000000   0.0000000  999.50900 1344.73069  802.46711
2        3329       2428   255.66575  150.974214  363.383752   84.0976294  26.9956007  13.1194766  649.07666  705.40064  203.19983
3         141        155    63.39679   81.293807   81.388790   16.6745300  15.5743850  18.0392803   69.05071   36.92180   70.60333
4         147        162    14.55008    9.954344    2.292642    0.7249796   0.5191462   0.0000000   70.77698   68.01384   49.93894
5         200        136    10.39292    1.659057    1.146321    0.0000000   0.0000000   0.0000000   77.68205  122.42490  130.87446
6        3375       2650   288.92309    0.000000    2.292642    0.0000000   0.0000000   0.0000000 1726.26770 1984.06074 1970.00509
7         138        251    54.04317    6.636229   13.755852    2.1749387   3.1148770   2.4599019  189.88945  363.38821  354.73868
8          91         75    51.96458   48.112662  102.022568   22.4743665  15.0552389  11.4795420   55.24057   34.97854   15.49829
9        4168       2412   181.87604    0.000000    9.170568    0.0000000   0.0000000   0.0000000 1522.56811 1554.60195 1513.66650
10        683        425    55.08246    1.659057    0.000000    0.0000000   0.0000000   0.0000000  381.50516  378.93423  335.79632
11        151        192    76.90758   97.884380   74.510864   32.6240804   7.2680463   8.1996729   39.70416   33.03529   48.21691
12       2631       1652   161.09021    0.000000    0.000000    0.0000000   0.0000000   0.0000000  947.72097 1142.63244  817.96540
13       3321       3283   253.58717    4.977172    0.000000    0.7249796   1.5574385   2.4599019 2076.70004 1952.96870 1439.61911
14        224        200    18.70725    3.318115    1.146321    0.0000000   1.5574385   0.8199673  100.12353  120.48165   99.87788
15       3040       1146   217.21196   92.907208  158.192296   50.0235899  28.5530392  27.8788878  619.73010 1218.41928  249.69470
16       1478       1935  2663.70458 2830.351744 4149.681960 1059.1951431 757.9534044 411.6235780 1091.00119  748.15219  711.19939
17       2493       2043  1721.06702 1927.824576 2261.691300  695.2554019 397.1468181 245.9901861 1189.39845 1243.68156  588.93509
18        103         49    69.63254   48.112662   75.657185   17.3995095   6.2297540   6.5597383   25.89402   33.03529   49.93894
   norm.AVa norm.AVb norm.AVc   baseMean log2FoldChange     lfcSE      stat       pvalue         padj
1  1338.91643 1195.31581 1255.19868  592.99460     -22.014388 2.4510531 -8.981604 2.668480e-19 1.301818e-15
2  2318.86931 1877.86990 1941.16077  715.81779      -2.633620 0.6685746 -3.939157 8.176858e-05 2.732247e-03
3    61.10018   79.53730  123.92089   59.79182      -2.170184 0.3402018 -6.379108 1.781223e-10 8.989342e-08
4   118.08785   82.92186  129.51732   45.60816      -4.348125 1.1083976 -3.922893 8.749194e-05 2.874272e-03
5    85.77525  112.81886  108.73059   54.29203      -5.279819 1.2830666 -4.115000 3.871797e-05 1.559613e-03
6  1671.44241 1903.81824 2118.64746  972.12145      -9.610427 2.1835291 -4.401327 1.075906e-05 5.942047e-04
7   164.50048   77.84501  200.67189  119.43489      -3.273525 0.8071359 -4.055730 4.997806e-05 1.878019e-03
8    71.08771   51.33258   59.96172   44.93403      -2.049240 0.4168799 -4.915661 8.848356e-07 8.633340e-05
9  1667.32990 2351.14502 1928.36893  894.06059      -9.003871 1.9513453 -4.614187 3.946379e-06 2.790204e-04
10  284.93834  385.27640  339.78308  180.24792      -7.259619 1.9058171 -3.809190 1.394229e-04 4.101557e-03
11  116.32534   85.17824  153.50200   64.44638      -2.384568 0.4386113 -5.436632 5.429705e-08 1.025374e-05
12 1145.62836 1484.13209 1320.75683  584.99386     -21.869489 2.4257179 -9.015677 1.956579e-19 1.145421e-15
13 1448.77925 1873.35715 2624.72438  973.28794      -5.779478 1.5647116 -3.693638 2.210688e-04 5.681217e-03
14  141.58792  126.35712  159.89792   64.48959      -3.253415 0.8474613 -3.839014 1.235296e-04 3.718087e-03
15 1404.71662 1714.84665  916.21509  558.19912      -2.141628 0.5767180 -3.713475 2.044328e-04 5.386095e-03
16 1408.82914  833.73137 1547.01239 1517.70301      -2.113006 0.3112430 -6.788928 1.129698e-11 1.066690e-08
17 1851.80544 1406.28707 1633.35727 1263.53668      -2.142501 0.3183755 -6.729478 1.702734e-11 1.311598e-08
18   55.81266   58.10171   39.17499   40.46242      -2.709772 0.4019223 -6.742029 1.561898e-11 1.269953e-08
 
 
> sessionInfo()
R version 3.3.1 RC (2016-06-17 r70798)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252    LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
[5] LC_TIME=French_France.1252    
attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     
other attached packages:
[1] ade4_1.7-4                 DESeq2_1.12.3              SummarizedExperiment_1.2.3 Biobase_2.32.0            
[5] GenomicRanges_1.24.2       GenomeInfoDb_1.8.2         IRanges_2.6.1              S4Vectors_0.10.2          
[9] BiocGenerics_0.18.0       
loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5          RColorBrewer_1.1-2   plyr_1.8.4           XVector_0.12.0       tools_3.3.1          zlibbioc_1.18.0     
 [7] rpart_4.1-10         RSQLite_1.0.0        annotate_1.50.0      gtable_0.2.0         lattice_0.20-33      Matrix_1.2-6        
[13] DBI_0.4-1            gridExtra_2.2.1      genefilter_1.54.2    cluster_2.0.4        locfit_1.5-9.1       grid_3.3.1          
[19] nnet_7.3-12          data.table_1.9.6     AnnotationDbi_1.34.4 XML_3.98-1.4         survival_2.39-5      BiocParallel_1.6.2  
[25] foreign_0.8-66       latticeExtra_0.6-28  Formula_1.2-1        geneplotter_1.50.0   ggplot2_2.1.0        Hmisc_3.17-4        
[31] scales_0.4.0         splines_3.3.1        rsconnect_0.4.3      colorspace_1.2-6     xtable_1.8-2         acepack_1.3-3.3     
[37] munsell_0.4.3        chron_2.3-47        
 

 

deseq2 • 1.9k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 13 hours ago
United States

hi Carine,

I don't really have any suggestions for you here. When you have 0s in one group and counts in the other which are larger than the technical and biological within-group variability that is present in the dataset, the probability is low that the data is generated from the null hypothesis.

ADD COMMENT
0
Entering edit mode

ok thanks for your answer. So basically, from your experience, do you will take them into account for pathway analysis  or should I have to discard them ? 

ADD REPLY
0
Entering edit mode

By the way, have you done any QC on these samples? For example, what does a PCA plot look like? (see vignette for examples)

ADD REPLY
0
Entering edit mode

Yes I did. Here the results for me it's fine ....:  I use ade4 package

dds2<-as.data.frame((counts(dds1,norm=FALSE)))
lmat<-log(1+dds2)

library(ade4)

lmatc<-bicenter.wt(lmat)
lmatc.pca<-dudi.pca(lmatc,center=F,scale=F,nf=3,scannf=F)
s.class(lmatc.pca$co,fac=dds1$genotype:dds1$condition, col=c(1,2,3,4, addaxes=FALSE, cgrid=5))

####The eigenvalues inform us on the inertia kept by each axis.
(kip<-100*lmatc.pca$eig/sum(lmatc.pca$eig))

##  [1] 52.864812  9.090166  7.021512  6.097858  4.647150  4.463586  4.038370
##  [8]  3.596396  3.501872  2.399417  2.278860

####The first axis extract 52.86% of the total inertia. The second axis account for 9.09  % of the variance.
 

 

 

ADD REPLY
0
Entering edit mode

 

I also done PCA with DESEQ2 and found nearly 92% for PC1 which is very strange for me...my first axis correspond to genotype and the second one to condition.

In fact, I've done all the exploraty analysis and think it was ok but maybe it's not ...

 

 

 

ADD REPLY

Login before adding your answer.

Traffic: 481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6