Question

model design in rna seq - deseq2

0

Entering edit mode

acebolladaso.iacs • 0

@acebolladasoiacs-20298

Last seen 5.1 years ago

I have 8 samples that correspond to 4 persons measured in two times, 0h and 20h.

names_chip person time sample
1 IonCode_0109 A1 0 Donor 1- Day 0
2 IonCode_0110 A1 20 Donor 1- Day 20
3 IonCode_0111 A2 0 Donor 2- Day 0
4 IonCode_0112 A2 20 Donor 2- Day 20
5 IonCode_0113 A3 0 Donor 3- Day 0
6 IonCode_0114 A3 20 Donor 3- Day 20
7 IonCode_0115 A4 0 Donor 4- Day 0
8 IonCode_0116 A4 20 Donor 4- Day 20

The researchers would to see what genes are DE between the two timepoints. They hope there are many changes. The service of genomic send me the rowdata counts with 20812 genes. I follow the pipelines of deseq2 library.

dds <- DESeqDataSetFromMatrix(countData = counts,
colData = annotation,
design = ~ time+person)

I have made pca plots and clustering of normalizated counts and i can see that the samples of the same person are closely to each other, but between persons are very separated.I could hope this.

At the moment i don't filter by number of counts. I do

dds.parametric.wald<-DESeq(dds)
contrast_oe <- c("time","0","20")
res.parametric.wald <- results(dds.parametric.wald,contrast=contrast_oe,independentFiltering = T)
summary(res.parametric.wald)

and the follow result

out of 17633 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 6, 0.034%
LFC < 0 (down) : 14, 0.079%
outliers [1] : 0, 0%
low counts [2] : 2706, 15%
(mean count < 1)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Oh! Only 20 DEG!! If I study the contrasts between persons (e.g)

res.parametric.wald.a1.a2 <- results(dds.parametric.wald,contrast=c("subject","A1","A2"),independentFiltering = T)
summary(res.parametric.wald.a1.a2)

I get

out of 17633 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up) : 4194, 24%
LFC < 0 (down) : 3317, 19%
outliers [1] : 0, 0%
low counts [2] : 4064, 23%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

Is it possible when i contrast between timepoints there are some background noise by the variability of the persons, and thus got reduce the number of DEG? With the goal of increase the number of genes DE between timepoints, would it be correct select the genes that are not DE between persons, and only with this genes compare between timepoints?? Methodologically and statistically is correct?? Any suggestion/way/design to increase the number of DEG between timepoints?

deseq2 rna-seq variability model-design • 1.0k views

ADD COMMENT • link 5.1 years ago acebolladaso.iacs • 0

0

Entering edit mode

Hey,

It would be really helpful for us if you your text has a proper formatting. Please use code blocks for code

dds <- DESeqDataSetFromMatrix(countData = counts, colData = annotation, design = ~ time+person)

or add a table for your condition table and maybe some graphic from the PCA

As for the design, I think you have time and person switched. Have you tried

dds <- DESeqDataSetFromMatrix(countData = counts, colData = annotation, design = ~ person + time)
res <- results(dds,contrast=c("time","Day20","Day0")

ADD REPLY • link 5.1 years ago mat.lesche ▴ 90

0

Entering edit mode

If the poster specifies the contrast explicitly, I don't think the order of person + time matters.

ADD REPLY • link 5.1 years ago swbarnes2 ★ 1.3k

score 0 · Answer 1 · 2019-03-25

Your first approach is the correct way to go, and with four samples, you only have power to detect large changes over time. You have appropriately controlled for donor, I don't have any other recommendation for how to analyze this data.

If you want to find more genes, but allow for a higher FDR, you can change the alpha to 0.2, to allow one in five reported genes to be false positive. Otherwise, you should add more samples to have higher sensitivity.