How to perform DESeq2 based on paired sample test?
1
1
Entering edit mode
zhxiaokang ▴ 20
@zhxiaokang-13311
Last seen 20 months ago
Norway

I'm using DESeq2 to do differentially expression analysis. I have 8 liver slices from control group, and 6 liver slices from treatment group. So the purpose is to find out the differentially expressed genes between these two groups. But those slices are not independent: slices from control group the slices from treatment group may come from the same fish.

I'm using DESeq2 package in R, so my colData looks like:

samples fishGroup expGroup
1  Sample12         3  control
2  Sample18         4  control
3  Sample25         5  control
4  Sample31         6  control
5  Sample44         8  control
6   Sample6         2  control
7  Sample11         2    treat
8  Sample17         3    treat
9  Sample24         4    treat
10 Sample30         5    treat
11 Sample37         6    treat
12 Sample50         8    treat

If I perform DESeq2 based on independent sample test, then only "expGroup" will be used:

dds_independent <- DESeqDataSetFromMatrix(countData, colData, design = ~expGroup)

But that will miss the samples' pair information. So I guess that the paired sample test is better.

In the section of "Model matrix not full rank" from vignette('DESeq2'), it seems to discuss about this problem, to use a design like this: design = ~ expGroup + fishGroup. I'm no expert on statistics, but it seems that the two factors are treated equally, but in my understanding, expGroup should be prior, because fishGroup only gives the pair information, since I want to find the differentially expressed genes in different experiment groups (control and treat).

I ran both ( design = ~expGroupand design = ~expGroup + fishGroup) again, but But the result is quite abnormal: I got 10 differentially expressed (DE) genes (padj < 0.05) from the first design, but 2412 DE genes from the second design. I've looked into the DE genes from the second design, and they're not differentially expressed, since you'll find that the slice sample pairs (from the same fish) give you different conclusion about whether the gene is up regulated or down regulated. Take one gene which is regarded as DE gene from the second design:

fishGroup   expGroup  count
    2  control  5.1
    2  treat  25.6
    3  control  222.9
    3  treat  143.4
    4  control  25.6
    4  treat  164.0
    5  control  0
    5  treat  5.1
    6  control  151.1
    6  treat  61.5
    8  control 886.3
    8  treat  141

So anyone has any idea of how to design the "design" to tell DESeq2 to do a paired sample test?

rnaseq deseq2 • 6.6k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 8 hours ago
United States

Can you share your code? It looks like you could be testing for differences across fish, not treatment. If you don't specify any additional arguments when you call results(), it uses the *last* term in the design. This is fairly standard behavior across Bioc statistical packages also.

ADD COMMENT
0
Entering edit mode
library(DESeq2)

# the group of samples
groupControl <- c('1', '3', '4', '5', '6', '7', '8', '2')
groupTreat <- c('2', '3', '4', '5', '6', '8')

groupControl <- groupControl[groupControl %in% groupTreat]

# load the count file
control <- read.table('./control_BaP_EE2_1000.txt', header = T)
rownames(control) <- control[, 1]
control <- control[, -1]
control <- control[-c(1,6)]  # the fish of these two samples are not present in treat 
treat <- read.table('./BaP_EE2_1000.txt', header = T)
rownames(treat) <- treat[, 1]
treat <- treat[, -1]

# the count table, each row is a gene, and each row is one sample
countData<- cbind(control, treat)

# get the sample id
samples <- colnames(countData)

# get the fish group
fishGroup <- c(groupControl, groupTreat)

# generate the experiment group
expGroup <- c(rep('control', ncol(control)), rep('treat', ncol(treat)))

# get the colData mixing the above 3 data
colData <- data.frame(samples, fishGroup, expGroup)

# creat the DESeqDataSet
dds_indp <- DESeqDataSetFromMatrix(countData, colData, design = ~ expGroup)
dds_paird <- DESeqDataSetFromMatrix(countData, colData, design = ~ expGroup + fishGroup)

# perform DEA
dea_indp <- DESeq(dds_indp)
dea_paired <- DESeq(dds_paird)

# result analysis
res_indp <- results(dea_indp)
res_paired <- results(dea_paired)

# p-adjusted value
padj_indp <- res_indp$padj
padj_paired <- res_paired$padj

# gene list
geneList <- rownames(control)

# significantly expressed genes
sig_indp <- geneList[which(padj_indp < 0.05)]
sig_paired <- geneList[which(padj_paired < 0.05)]
ADD REPLY
0
Entering edit mode

To be sure that I made it clear: My purpose is to find out the differentially expressed genes between control group and treatment group, since I want to see the effect of the chemicals which were injected into the samples in the treatment group. But since there is always a pair of slices (samples) in these two groups coming from the same fish, so I think it's better to take this information into consideration, which is call "paired sample test" in statistics.

ADD REPLY
0
Entering edit mode

Check out the vignette for more details on this topic but the gist of what I'm saying is put treatment at the end of the design formula. You have 'fish' at the end now.

ADD REPLY
0
Entering edit mode

hi,

Something to add: another way you know what results table you are looking at in DESeq2, is that if you just print the table, e.g.

> res

And press [return], it will say at the top of the table what coefficient you tested, (or what test was performed, etc).

ADD REPLY
0
Entering edit mode

Nice, thank you! I didn't notice this point before, it works perfectly now!

One more question: as you can see from the code above, I have two samples in control group who don't have their pair in treatment group, so for now, I deleted these two samples from control group. Do you have a better suggestion for this missing match problem?

ADD REPLY
0
Entering edit mode

For the fixed effect model, you can't get use of them, because we are essentially looking at the fold changes within pairs and seeing if that effect is large and consistent across pairs. There is a trade-off, if it were many samples that were unpaired, I would use limma with the duplicateCorrelation() function to inform the model that only some of the samples are paired. In my opinion, and from working with similar datasets, where the pair effect is often strong, the strength of this kind of analysis hinges though on the differences within the paired samples, and so if it's not many unpaired, I would just remove them for the analysis of treatment effect.

ADD REPLY
0
Entering edit mode

Got it, and thank you very much for all these quick and helpful replies. 

ADD REPLY

Login before adding your answer.

Traffic: 897 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6