DESeq2 gives weird results
1
0
Entering edit mode
chipolino • 0
@chipolino-15565
Last seen 4.5 years ago

Hi,

I am doing DE analysis. I have 4 conditions: control, drug1, drug2, drug3 (2 replicates for each condition). I started with Salmon to count transcripts and imported them with tximport. After this I constructed DESeq data set using DESeqDataSetFromTximport following this tutorial. Then I tried to compare control with drug1. Here is my code:

>library(tximportData)
>library(tximport)
>library(tximeta)
>library(DESeq2)
>library(GenomicFeatures)

>files <- file.path("path_to_salmon_files", "quants", metadata$Quant, "quant.sf")
>names(files) <- metadata$Samples
>TxDb <- makeTxDbFromGFF(file="gencode.v30.annotation.gff3.gz")

>k <- keys(TxDb, keytype = "TXNAME")
>tx2gene <- AnnotationDbi::select(TxDb, k, "GENEID", "TXNAME")

>txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)

>ddsTxi <-DESeqDataSetFromTximport(txi, colData = metadata, design =~Treatment)

>keep <-rowSums(counts(ddsTxi))>=10
>ddsTxi <- ddsTxi[keep,]

#"drug1_1", "drug1_2" are technical replicates
>xt <- ddsTxi[,c("drug1_1", "drug1_2", "control1", "control2")]
>yt <- factor(c("D", "D", "C", "C"))
>xt$Treatment <- yt
>xt <-DESeq(xt)
>res <-results(xt, name="Treatment_C_vs_D")

>resOrdered <- res[order(res$padj),]
>results <-as.data.frame(resOrdered)

Then when I check results I see this information for gene ETV6:

| baseMean  | log2FoldChange    | lfcSE     | stat      | padj      |
|---------- |----------------   |-------    |------     |------     |
| 661.72988 | -3.5438716        | 0.3112069 | -11.387510| 2.982196e-26   |

Which basically means that the gene is downregulated in control compared to drug1 (with log fold change 3.5). But if I look at raw counts, this is what I see:

| drug1_1 | drug1_2      | control1 | control2 |
|----------|--------------|-------  |------    |
| 280      | 357          | 396     | 401      | 

So according to raw counts, there is no significant change in control compared to treatment. And almost all counts come from the single transcript in all conditions. All library sizes are very similar (~15M). What might be the reason for such log fold change in DESeq analysis? Can it be that in raw counts there is no change, but due to normalization relative levels of this gene in the drug treatment (compared to all other genes) increased?

P.S. When I analyze the same data with limma-voom, I get logFC for this gene 0.23 (increase in control compared to drug1).

deseq2 salmon • 1.2k views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 3 hours ago
United States

Note that the base mean for the gene you have selected is 661.7. But the counts that you posted are all smaller.

What are the normalized counts instead of the raw counts?

ADD COMMENT
0
Entering edit mode

Thank you for your response!

Here are the normalized counts:

| drug1_1  | drug1_2      | control1 | control2  |
|----------|--------------|-------   |------     |
| 1293.573 | 1144.279     | 90.263   | 118.803   | 

I also checked the quant.sf files from Salmon for this gene and that's what I see:

Drug1

|Transcript| Length  | EffectiveLength | TPM       |NumReads   |
|----------|---------|-------          |------     |------     |
| ETV6-202 | 5989    | 5727.000        | 0.130581  | 275.035   | 
| ETV6-203 | 572     | 323.000         | 0         | 0         | 
| ETV6-204 | 733     | 484.000         | 0         | 0         | 
| ETV6-205 | 652     | 403.000         | 0         | 0         | 
| ETV6-201 | 160     | 6.178           | 2.185476  | 4.965     | 

Drug2

|Transcript| Length  | EffectiveLength | TPM       |NumReads   |
|----------|---------|-------          |------     |------     |
| ETV6-202 | 5989    | 5727.000        | 0.107006  | 351.076   | 
| ETV6-203 | 572     | 323.000         | 0         | 0         | 
| ETV6-204 | 733     | 484.000         | 0.004320  | 1.198     | 
| ETV6-205 | 652     | 403.000         | 0         | 0         | 
| ETV6-201 | 160     | 6.178           | 1.354557  | 4.794     | 

Control1

|Transcript| Length  | EffectiveLength | TPM       |NumReads   |
|----------|---------|-------          |------     |------     |
| ETV6-202 | 5989    | 5727.000        | 0.191470  | 396.000   | 
| ETV6-203 | 572     | 323.000         | 0         | 0         | 
| ETV6-204 | 733     | 484.000         | 0         | 0         | 
| ETV6-205 | 652     | 403.000         | 0         | 0         | 
| ETV6-201 | 160     | 6.178           | 0         | 0         | 

Control2

|Transcript| Length  | EffectiveLength | TPM       |NumReads   |
|----------|---------|-------          |------     |------     |
| ETV6-202 | 5989    | 5727.000        | 0.084566  | 393.219   | 
| ETV6-203 | 572     | 323.000         | 0         | 0         | 
| ETV6-204 | 733     | 484.000         | 0.006440  | 2.531     | 
| ETV6-205 | 652     | 403.000         | 0.016047  | 5.251     | 
| ETV6-201 | 160     | 6.178           | 0         | 0         | 

So looks like the overall difference comes from ETV6-201 transcript... but it has just a few reads in the drug treatment. How can I take this into account? Should I use limma-voom instead?

ADD REPLY
1
Entering edit mode

The issue is not limma-voom vs DESeq2, but using counts alone vs. using the transcript lengths.

I'm suspicious, like you are, about the large change in normalized counts from the expression switching to the short isoform, given there are only a few reads allocated to that transcript.

My quick fix for this dataset would be to only use the txi$counts, and pass these to DESeqDataSetFromMatrix, so tossing out the average transcript lengths for normalization, because we don't trust it in the above example. A more involved approach to deal with these short isoforms would be to generate the transcript-level matrices with tximport, then remove all the transcripts with a very low read count, then summarizeToGene and pass to DESeqDataSetFromTximport.

ADD REPLY

Login before adding your answer.

Traffic: 565 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6