Question: DESeq2 influence of 2 samples
0
gravatar for gregory.l.stone
2.9 years ago by
gregory.l.stone10 wrote:

I am having trouble figuring out why 2 sets of paired samples are affecting my data so severely. I have ~100 samples, each paired (so ~200 sets of count files). I aligned my fastq files with STAR, counted genes with featureCounts, and am using DESeq2 for differential expression analysis. My design is ~sex + sex:nested + sex:condition, where nested is the pairing factor.

At first pass, I get 716 significant genes (padj < 0.05), yet many genes have count plots like this: http://i.imgur.com/2Yrd410.png

The seemingly outlier sample is present in many of the genes, so I removed the pre and post counts for that sample.

I then got 18 sig genes, most of which look like this: http://i.imgur.com/wGE53DH.png

The seemingly outlier sample is present in all of the 18 genes, so I removed the pre and post counts for that sample. Now I get no significant genes. It seems strange that 2 samples would have such a large effect on an otherwise large data set. I would appreciate any ideas and guidance as to how best proceed! Thank you!

ADD COMMENTlink modified 2.9 years ago by Michael Love26k • written 2.9 years ago by gregory.l.stone10

> sessionInfo()
R version 3.3.1 (2016-06-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.8 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets
[8] methods   base

other attached packages:
[1] IHW_1.2.0                  DESeq2_1.14.1
[3] SummarizedExperiment_1.4.0 Biobase_2.34.0
[5] GenomicRanges_1.24.3       GenomeInfoDb_1.10.2
[7] IRanges_2.8.1              S4Vectors_0.12.1
[9] BiocGenerics_0.20.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8          RColorBrewer_1.1-2   plyr_1.8.4
 [4] XVector_0.14.0       bitops_1.0-6         tools_3.3.1
 [7] zlibbioc_1.20.0      digest_0.6.12        rpart_4.1-10
[10] memoise_1.0.0        RSQLite_1.1-2        annotate_1.52.1
[13] tibble_1.2           gtable_0.2.0         lattice_0.20-34
[16] Matrix_1.2-7.1       lpsymphony_1.2.0     DBI_0.5-1
[19] gridExtra_2.2.1      genefilter_1.56.0    cluster_2.0.5
[22] locfit_1.5-9.1       grid_3.3.1           nnet_7.3-12
[25] data.table_1.10.2    AnnotationDbi_1.36.2 fdrtool_1.2.15
[28] XML_3.98-1.5         survival_2.39-5      BiocParallel_1.8.1
[31] foreign_0.8-67       latticeExtra_0.6-28  Formula_1.2-1
[34] geneplotter_1.52.0   ggplot2_2.2.1        Hmisc_3.17-4
[37] scales_0.4.1         splines_3.3.1        assertthat_0.1
[40] colorspace_1.3-1     xtable_1.8-2         acepack_1.4.1
[43] RCurl_1.95-4.8       lazyeval_0.2.0       munsell_0.4.3
[46] slam_0.1-40

ADD REPLYlink written 2.9 years ago by gregory.l.stone10
Answer: DESeq2 influence of 2 samples
0
gravatar for Michael Love
2.9 years ago by
Michael Love26k
United States
Michael Love26k wrote:

Can you post your full code, I want to make sure it's not something going wrong with the construction of the model matrix.

ADD COMMENTlink written 2.9 years ago by Michael Love26k

Yes of course. Here is my code:

library(DESeq2)

library(IHW)

samples <- read.csv(sampleTable.csv)

sampleTable <- data.frame(sample = samples$PID, fileName = samples$featureCounts_PID, sex = samples$Sex, condition = samples$Condition, nested = samples$Nested)

sampleTable$sex <- relevel(sampleTable$sex, ref = "Male")
sampleTable$condition <- relevel(sampleTable$condition, ref = "pre")

#even though counts are from featureCounts, formatted results to be same as HTSeq-count results to use following import method

dds <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design = ~ 1)

model = model.matrix(~ sex + sex:nested + sex:condition, colData(dds))

apply(model, 2, function(x) all(x == 0))

results_source <- DESeq(dds, full = model, betaPrior = FALSE)

sex_diff = results(results_source, contrast = list("sexFemale.conditionpost", "sexMale.conditionpost"))

 

Thank you for the help!

 

 

ADD REPLYlink written 2.9 years ago by gregory.l.stone10

What is the output of the apply() call?

Please post the code where you remove samples and rerun DESeq2 also.

ADD REPLYlink written 2.9 years ago by Michael Love26k

The code is the same for when I remove samples. I just modify what csv file I read in.

The output of the apply() call is as follows:

(Intercept)               sexFemale          sexMale:nested        sexFemale:nested   sexMale:conditionpost
                  FALSE                   FALSE                   FALSE                   FALSE                   FALSE
sexFemale:conditionpost
                  FALSE

 

Thanks!

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by gregory.l.stone10

And you modify the count matrix as well?

Just want to make sure you're running a comparable analysis.

ADD REPLYlink written 2.9 years ago by Michael Love26k

Yes, I do modify the count matrix as well

ADD REPLYlink written 2.9 years ago by gregory.l.stone10

I finally can see what's going wrong, since you posted this output.

It's a common R problem: you are coding the patient as a numeric vector instead of as a factor (categorical). So patient 1 + patient 2 = patient 3, etc. 

After coding the patient as a factor, you will get ~100 of coefficients in the model.matrix for the patients, which is the point: you want to control for all the patient differences, and to estimate and test the sex-specific treatment effect.

ADD REPLYlink written 2.9 years ago by Michael Love26k

Wow I'm dumb. Of course it would be something like that! Thank you so much for the help, and for being so generous with your time!

ADD REPLYlink written 2.9 years ago by gregory.l.stone10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 204 users visited in the last hour