DESeq2 - Cannot replicate log2FC calculations
1
0
Entering edit mode
Sara • 0
@64e16527
Last seen 2.9 years ago

Dear Michael,

Sorry in advance for the long question. My name is Sara, and I am a PhD student doing a metatranscriptomic study in some salt ponds in Mallorca and studying the changes in gene expression at various times of the day. As I am relatively new to bioinformatics, a colleague told me about the DEApp application (https://yanli.shinyapps.io/DEApp/), where they implement your DESeq2 package. However, I like to understand what is being done, so I have tried to replicate the results using DESeq2 in R in a standalone way. Unsuccessfully.

The baseMean, pvalue or lfcSE values are quite similar, but the log2FoldChange is really different. I contacted the App developer and he told me that the code I was using seemed correct, and that maybe the problem was with DESeq2 and the new version of R (>4), where the 'results()' function does not work maybe well. Again I tried to implement it in a previous version, without success. I also tried the same code without removing the low expression tags from the counts.

I assume this is a bug in my code and was wondering if you could help me.

counts <- read.delim(paste("Raw_reads_D1_vs_All.txt",sep="/"), header=T, row.names=1, skipNul = T)
head(counts)
             Day1A Day1B Day1C Avg1 Avg2 Avg3 Avg4 Avg5 Avg6 Avg7 Avg8 Avg9 Avg10 Avg11 Avg12 Avg13 Avg14
samplewt_001     0     0     0    0    0    0    2    0    2    0    1    0     1     3     0     1     1
samplewt_002    24    19    29   24   19   29   19   65   86  105   86   97   125   130    99    86    74
samplewt_003     2     1     0    2    1    0    7    3    0    3    4    2     4     3     1     4     2
samplewt_004    21    21    12   21   21   12   52   37   41   25   30   19    15    28    31    32    33
samplewt_005    29    19    16   29   19   16   15   35   16   50   53   59    34    41    25    31    32
samplewt_006    38    29    23   38   29   23   38   41   29   64   54   73    40    35    38    39    39
             Avg15 Avg16 Avg17 Avg18
samplewt_001     2     4     2     1
samplewt_002    45    43    27    43
samplewt_003     2     2     2     4
samplewt_004    45    27    18    39
samplewt_005    28    25    30    22
samplewt_006    38    31    35    24

coldata <- read.delim(paste("Metatable_D1_vs_All.txt",sep=""), header=T, row.names = 1)
coldata
      Treatment
Day1A      Day1
Day1B      Day1
Day1C      Day1
Avg1        Avg
Avg2        Avg
Avg3        Avg
Avg4        Avg
Avg5        Avg
Avg6        Avg
Avg7        Avg
Avg8        Avg
Avg9        Avg
Avg10       Avg
Avg11       Avg
Avg12       Avg
Avg13       Avg
Avg14       Avg
Avg15       Avg
Avg16       Avg
Avg17       Avg
Avg18       Avg

coldata$Treatment <- factor(coldata$Treatment)
dds <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = ~Treatment)
dge.count <- calcNormFactors(dds)
cpm.count <- cpm(dge.count$counts)
keep <- rowSums(cpm.count > 1) >=2
dge.count.rmlow <- dge.count[keep,]
dge.count.rmlow$samples$lib.size <- colSums(dge.count.rmlow$counts)
dge.count.rmlow <- calcNormFactors(dge.count.rmlow, lib.size = TRUE, method = "TMM")
counts2 <- dge.count.rmlow$counts
coldata <- dge.count.rmlow$samples$Treatment
colData <- data.frame(coldata)
dds <- DESeqDataSetFromMatrix(counts2, colData = colData, design = formula(~coldata))
dds <- DESeq(dds, test = "Wald")
resultsNames(dds)
[1] "Intercept"           "coldata_Day1_vs_Avg"

res <- results(dds, contrast = c("coldata", "Day1","Avg"))
log2 fold change (MLE): coldata Day1 vs Avg 
Wald test p-value: coldata Day1 vs Avg 
DataFrame with 21692 rows and 6 columns
               baseMean log2FoldChange     lfcSE      stat     pvalue      padj
              <numeric>      <numeric> <numeric> <numeric>  <numeric> <numeric>
samplewt_001   0.889321      -2.385359  1.442652 -1.653454 0.09823858        NA
samplewt_002  57.939492      -1.240640  0.469598 -2.641921 0.00824373  0.205107
samplewt_003   2.262310      -1.169396  0.963996 -1.213071 0.22510264        NA
samplewt_004  26.855727      -0.517509  0.339327 -1.525105 0.12723290  0.411924
samplewt_005  29.231810      -0.357366  0.368943 -0.968621 0.33273415  0.630221
...                 ...            ...       ...       ...        ...       ...
samplewt_9993  38.88551      0.1753683  0.266028  0.659209 0.50976148  0.768292
samplewt_9994  35.78226     -0.0505332  0.266456 -0.189649 0.84958409  0.943335
samplewt_9995  45.59429     -0.7032288  0.261269 -2.691593 0.00711117  0.205107
samplewt_9996   4.82784     -0.0637269  0.557457 -0.114317 0.90898626        NA
samplewt_9999  20.25589      0.1051234  0.421189  0.249587 0.80290656  0.921288


sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252    LC_MONETARY=Spanish_Spain.1252
[4] LC_NUMERIC=C                   LC_TIME=Spanish_Spain.1252    

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.34.0                limma_3.48.0                DESeq2_1.32.0              
 [4] SummarizedExperiment_1.22.0 Biobase_2.52.0              MatrixGenerics_1.4.0       
 [7] matrixStats_0.59.0          GenomicRanges_1.44.0        GenomeInfoDb_1.28.0        
[10] IRanges_2.26.0              S4Vectors_0.30.0            BiocGenerics_0.38.0        

loaded via a namespace (and not attached):
 [1] KEGGREST_1.32.0        genefilter_1.74.0      locfit_1.5-9.4         splines_4.1.0         
 [5] lattice_0.20-44        colorspace_2.0-1       vctrs_0.3.8            utf8_1.2.1            
 [9] blob_1.2.1             XML_3.99-0.6           survival_3.2-11        rlang_0.4.11          
[13] pillar_1.6.1           glue_1.4.2             DBI_1.1.1              BiocParallel_1.26.0   
[17] bit64_4.0.5            RColorBrewer_1.1-2     GenomeInfoDbData_1.2.6 lifecycle_1.0.0       
[21] zlibbioc_1.38.0        Biostrings_2.60.0      munsell_0.5.0          gtable_0.3.0          
[25] memoise_2.0.0          geneplotter_1.70.0     fastmap_1.1.0          fansi_0.5.0           
[29] AnnotationDbi_1.54.0   Rcpp_1.0.6             xtable_1.8-4           scales_1.1.1          
[33] cachem_1.0.5           DelayedArray_0.18.0    annotate_1.70.0        XVector_0.32.0        
[37] bit_4.0.4              ggplot2_3.3.3          png_0.1-7              grid_4.1.0            
[41] tools_4.1.0            bitops_1.0-7           magrittr_2.0.1         RCurl_1.98-1.3        
[45] RSQLite_2.2.7          tibble_3.1.2           pkgconfig_2.0.3        crayon_1.4.1          
[49] ellipsis_0.3.2         Matrix_1.3-3           httr_1.4.2             R6_2.5.0              
[53] compiler_4.1.0

And here you have the same log2FoldChange for that genes through the DEApp software that I cannot replicate:

    baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
samplewt_001    0.889314286 -0.602643368    0.373997325 -1.61135743 0.10710184  NA
samplewt_002    57.93917767 -0.931255767    0.355965285 -2.6161421  0.008892954 0.201065875
samplewt_003    2.262309455 -0.534065195    0.420066974 -1.271381062    0.203593117 NA
samplewt_004    26.85569236 -0.444635895    0.290359802 -1.531327314    0.125688518 0.405507326
samplewt_005    29.23173355 -0.299055164    0.308682478 -0.968811597    0.332639199 0.626920005
........
samplewt_9993   38.88541672 0.159459313 0.241886366 0.659232333 0.509746587 0.765425095
samplewt_9994   35.78216478 -0.045925545    0.241822378 -0.189914372    0.84937623  0.943275421
samplewt_9995   45.59412138 -0.641157931    0.236590841 -2.70998627 0.006728599 0.201065875
samplewt_9996   4.827812856 -0.044329744    0.387192212 -0.114490277    0.908849153 NA
samplewt_9997   0.466748965 -0.067189502    0.33105697  -0.2029545  0.839170596 NA
samplewt_9998   0.227260594 0.089864329 0.263303759 0.341295277 0.732881303 NA
samplewt_9999   20.25584553 0.083946598 0.337284386 0.248889666 0.803446129 0.921117579

Thank you in advance for your help and for the great package!

DESeq2 • 559 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

DESeq2 hasn't changed in its methods since many versions ago (e.g. version 1.16, we are now on 1.32 with an increment of +.2 every 6 months).

"I contacted the App developer and he told me that the code I was using seemed correct, and that maybe the problem was with DESeq2 and the new version of R (>4), where the 'results()' function does not work maybe well"

I disagree with the developer. The problem with GUI interfaces that wrap tools is that you don't always know what's going on inside. Hopefully everything is correct, but you can't easily debug.

The LFC difference could be due to different size factors. The size factors would be different if you gave DESeq2 and this app different count tables. Above you do filtering before providing counts to DESeq2. Maybe something different happens inside the app. But again, that's an issue for the app developer to resolve. DESeq2 results() function did not change.

ADD COMMENT
0
Entering edit mode

I totally agree with you about GUI interfaces and I trust above all DESeq2 results. However, I cannot really understand the difference, since I provide exactly the same count tables for both. I also tried omitting the filtering before providing counts to DESeq2, with the same result. I will try to contact again the developer of the App to ask if there is any change with respect to the original code in DESeq2, although I cannot see anything at first sight.

Thank you for your answer!

ADD REPLY

Login before adding your answer.

Traffic: 662 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6