DESeq2 log2FoldChange of 48?
1
0
Entering edit mode
carleshf ▴ 10
@carleshf-7416
Last seen 6.5 years ago
Spain/Barcelona/ISGlobal

Dear all,

I used DESeq2 to perform a DE analysis on mRNA and, when extracting the contrasts I found that one of them has strange values on log2FoldChange.

Examples of this values are:

baseMean    log2FoldChange    lfcSE    stat    pvalue    padj    control.norm    control.norm.sd    case.norm    case.norm.sd    control.raw    control.raw.sd    case.raw    case.raw.sd    entrezgene    ensembl_gene_id    hgnc_symbol
0.451583505545774    48.8957821441089    3.06977055428295    15.9281553065552    4.0410373668378e-57    9.75021495870625e-53    0    0    1.5176109951821    1.52380894399315    0    0    1.33333333333333    1.211060141639    26863    ENSG00000206737    RNVU1-18
0.228079869975656    40.318409610448    3.40710380743093    11.8336311099513    2.61576228336841e-32    3.15565561865565e-28    0    0    0.331882123975104    0.541343418415764    0    0    0.333333333333333    0.516397779494322    149620    ENSG00000203878    CHIAP2
0.295106535209405    -43.035323935704    3.76355502327969    -11.4347534895881    2.80331740594844e-30    2.2546147456908e-26    0.149214083505201    0.394783357063187    0    0    0.285714285714286    0.755928946018454    0    0    400696    ENSG00000226025    
0.345263598177344    -31.756499075573    3.03938372161554    -10.4483349205717    1.49120227027424e-25    8.99493209429424e-22    0.822645190253885    0.878232249766697    0    0    0.714285714285714    0.755928946018454    0    0    2169    ENSG00000145384    FABP2
0.212013349024225    35.5888959495018    3.51136109756479    10.1353563363744    3.84967405244237e-24    1.85769871074659e-20    0    0    0.390411030164701    0.604849718887933    0    0    0.333333333333333    0.516397779494322    100422864    ENSG00000265981    MIR544B
0.257772697263073    -42.1836002941776    4.19204965429255    -10.0627625560167    8.07014201858527e-24    3.24527311040709e-20    0.308565055995893    0.550841610282403    0    0    0.285714285714286    0.487950036474267    0    0    100129027    ENSG00000205424

While other contrast is as expected:

baseMean    log2FoldChange    lfcSE    stat    pvalue    padj    control.norm    control.norm.sd    case.norm    case.norm.sd    control.raw    control.raw.sd    case.raw    case.raw.sd    entrezgene    ensembl_gene_id    hgnc_symbol
2483.71966540303    1.32485131527691    0.162630879037524    8.14637000745246    3.75011481830193e-16    6.7940830163176e-12    1413.81550878397    276.670790081588    3606.37780200197    1133.20438162628    1565.14285714286    594.397293783495    3475.5    1384.34255153845    9221    ENSG00000166197    NOLC1
591.896723229054    0.941463488371378    0.121775692478411    7.7311281850299    1.06597617839736e-14    9.65614521201244e-11    386.342696553717    28.4571984287309    745.386206238067    189.435252833034    418.428571428571    104.716851784319    730    315.487559184194    8882    ENSG00000109917    ZPR1
73.0950010099952    3.22858351861627    0.431964208272553    7.47419220570967    7.76795340911659e-14    4.69106706376551e-10    15.6194970329938    7.57155853894966    172.907613085039    155.016375713218    17.4285714285714    10.0806273425342    154.833333333333    133.553609710358    374    ENSG00000109321    AREG
341.11782393837    1.7595707621299    0.240333215700595    7.32137984756112    2.45434021151137e-13    1.11163204029879e-09    141.413691168911    24.3840654277537    534.422275484452    371.071571595906    154.428571428571    50.1459773822293    491.5    321.692244233522    8793    ENSG00000173530    TNFRSF10D
58.9101837146771    1.68002413777924    0.231785040329491    7.24819917364394    4.22349239481266e-13    1.53034023433642e-09    29.0758522070694    4.29691808349143    89.2536631356094    22.0780236896495    31.5714285714286    8.77225062070054    85.5    27.5154502052938    84541    ENSG00000163376    KBTBD8

The process I used to generate those tables follows:


desc$f1 <- relevel(desc$f1, "24.0")

dds.skin <- DESeqDataSetFromMatrix(
  countData = counts,
  colData   = desc,
  design    = ~ plate_batch + flowcell + id + f1
)
dds.skin <- DESeq(dds.skin, betaPrior = FALSE)
save(dds.skin, file="dds_t24.rda")

## EXTRACT CONTRAST FOR EACH COMPARISON
res  <- results(dds.skin, contrast=c("f1", "24.3", "24.0"), pAdjustMethod = "fdr")
res  <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.3", "24.0", "f1", "genes_24_0_3")

res  <- results(dds.skin, contrast=c("f1", "24.6", "24.0"), pAdjustMethod = "fdr")
res  <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.6", "24.0", "f1", "genes_24_0_6")

res  <- results(dds.skin, contrast=c("f1", "24.6", "24.3"), pAdjustMethod = "fdr")
res  <- res[order(res$padj), ]
export(res, desc, counts, norm, "24.6", "24.3", "f1", "genes_24_3_6")

Being the table genes_24_3_6 the one with strange logFC values and both genes_24_0_3 and genes_24_0_6 with expected ones. So, are those values at logFC correct or I did something wrong?

More info.: The variable f1 has the values 6.0, 24.0, 6.3, 24.3, 6.6 and 24.6.

R> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=es_ES.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=es_ES.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] biomaRt_2.24.1            DESeq2_1.8.2              RcppArmadillo_0.6.100.0.0 Rcpp_0.12.1
 [5] GenomicRanges_1.20.8      GenomeInfoDb_1.4.3        IRanges_2.2.7             S4Vectors_0.6.6
 [9] BiocGenerics_0.14.0       BiocInstaller_1.18.4

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.8.0        bitops_1.0-6         futile.options_1.0.0
 [7] tools_3.2.0          rpart_4.1-10         digest_0.6.8         annotate_1.46.1      lattice_0.20-33      RSQLite_1.0.0
[13] gtable_0.1.2         DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0      genefilter_1.50.0    cluster_2.0.3
[19] stringr_1.0.0        locfit_1.5-9.1       nnet_7.3-11          grid_3.2.0           Biobase_2.28.0       AnnotationDbi_1.30.1
[25] XML_3.98-1.3         survival_2.38-3      BiocParallel_1.2.21  foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1
[31] geneplotter_1.46.0   ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         splines_3.2.0
[37] scales_0.3.0         Hmisc_3.17-0         MASS_7.3-44          xtable_1.7-4         colorspace_1.2-6     stringi_0.5-5
[43] acepack_1.3-3.3      RCurl_1.95-4.7       munsell_0.4.2
deseq2 • 1.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…

Dear Chernandez

-  how do the (normalised) counts for RNVU1-18, CHIAP2 etc look like?

- how are the results if you allow dds.skin <- DESeq(dds.skin, betaPrior = TRUE)

What is probably going on is that the counts for these genes in one of the conditions are very low, leading to a large apparent fold change, and you have switched off the regularisation (with betaPrior = FALSE). If you want to learn more about the regularisation, have a look at the paper  and/or the manual page of nbinomWaldTest.

Kind regards

Wolfgang

ADD COMMENT

Login before adding your answer.

Traffic: 440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6