DESeq2 output explanation
1
5
Entering edit mode
@francescadefilippis-7043
Last seen 9.2 years ago
European Union

Hello!

I have a question about the results of differential expression analysis in deseq2.

I did:

DEgenes=results(DEcd, contrast=c("clD", "V", "VG"))

and I got the table with the log2 fold change and the p values. But what I didn't understand is to which group the fold change refer to. So if I have a negative log2 fold change, it means that the gene is down-regulated, but in which of the 2 groups of samples? Where can I find this information?

 

Many thanks

Francesca

deseq deseq2 • 57k views
ADD COMMENT
6
Entering edit mode
@mikelove
Last seen 19 hours ago
United States

We describe the interpretation of results in a few places which you might find useful. Check the section "More information on results columns" in the software vignette:

vignette("DESeq2")

and also the "Building the results table" section of the workflow (this has a slower pace than the vignette and might be helpful to look over):

http://www.bioconductor.org/help/workflows/rnaseqGene/

 

ADD COMMENT
2
Entering edit mode

Hi Michael,

thanks for your reply. I read both the vignette and the tutorial, but I still didn't find the information I'm looking for. 

If I extract the results for a specific contrast, let see A vs B, how can I know if the log2 fold changes are referred to A or B?

 

ADD REPLY
2
Entering edit mode

A positive log2 fold change for a comparison of A vs B means that gene expression in A is larger in comparison to B.

Here's the section of the vignette

"For a particular gene, a log2 fold change of −1 for condition treated vs untreated means that the treatment induces a change in observed expression level of 2^−1 = 0.5 compared to the untreated condition."

Here's the section of the workflow

"The column log2FoldChange is the effect size estimate. It tells us how much the gene's expression seems to have changed due to treatment with dexamethasone in comparison to untreated samples. This value is reported on a logarithmic scale to base 2: for example, a log2 fold change of 1.5 means that the gene's expression is increased by a multiplicative factor of 2^1.5≈2.82." 

ADD REPLY
0
Entering edit mode

Hi Michal

I have a further question. I read I can use the rlog transformation and use those values for heatplots or pca. Do I need to use raw counts as input for rlog or do I need to normalize for library size before (diving the reads for each gene by the total reads of the sample)

thanks!

ADD REPLY
1
Entering edit mode

Always check the documentation first, by typing the function name with a question mark in front: ?rlog

The help file tells you:

"This function transforms the count data to the log2 scale in a way which minimizes differences between samples for rows with small counts, and which normalizes with respect to library size."

The vignette (accessible via vignette("DESeq2")) section on transformations says:

"Both transformations produce transformed data on the log2 scale which has been normalized with respect to library size."

So the rlog function takes care of normalization for library size; you do not provide the rlog with normalized counts or non-integer values.

 

ADD REPLY
0
Entering edit mode

In plain English ,I have a comparison HSC and LSC where HSC is my control , Im comparing HSC vs LSC , so if the fold change is positive it means the gene is high in HSC ?

ADD REPLY
1
Entering edit mode

If HSC is control, nearly all (perhaps all) R/Bioc packages and analysts would expect you to set HSC as the reference level and report LSC vs HSC (read: log (LSC / HSC) as the LFC. This is also printed at the top of the results table when you print it to console if you follow the guidelines in the vignette on setting factor levels.

Also see the workflow (rnaseqGene package) which explains how to interpret the sign of the LFC.

ADD REPLY
0
Entering edit mode

"would expect you to set HSC as the reference level " yes this i had done . "report LSC vs HSC (read: log (LSC / HSC) as the LFC" thank you for clarifying me in simple words it was bit confusing although i might have ran your library more than 100 times

ADD REPLY
0
Entering edit mode

Hi Michael Love I have confusion regarding the calculation of log2FoldChange. The documentation says log2 fold change for gene i for sample j is given by, log2

This is specific to a sample j (suppose the treated one). How it is considering the 'control' sample to calculate fold change?

ADD REPLY
0
Entering edit mode

This is not specific to j. As in all generalized linear models the coefficients are calculated by computing the likelihood over all samples (j).

ADD REPLY
0
Entering edit mode

Hi Michael,

I have some doubt in the explanation "A positive log2 fold change for a comparison of A vs B means that gene expression in A is larger in comparison to B.'

Does it have anything to do with alphabetical order of the condition? Like, I have "Reponse" and "Non-response" in my sampledata. How to know if the increased positive  fold change is for Response group or no-response group?

 

Note : I am using Deseq with phyloseq , http://joey711.github.io/phyloseq-extensions/DESeq2.html

Thanks,

Reeba

 

 

ADD REPLY
2
Entering edit mode

See this section of the vignette:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#note-on-factor-levels

There are three ways to know:

You can specify the reference level as in the above link.

You can specify the contrast explicitly when you call results() by using the 'contrast' argument.

Finally, when you print the DESeqResults table, it has the information printed at the top, see here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#differential-expression-analysis

...
log2 fold change (MLE): condition treated vs untreated 
​...

 

ADD REPLY
0
Entering edit mode

Thanks Michael. That worked.

ADD REPLY
0
Entering edit mode

Hi Michael,

Thanks for your clear explanation. But does it not make more sense if one can get gene expression value (normalized) from both condition to its corresponding log fold change? 

May be I miss it in the manual but can you direct me to get gene expression value of each sample and also for each replicates? 

 

Thanks in advance,

 

ADD REPLY
0
Entering edit mode

See the vignette section, “Access to all calculated values”

ADD REPLY

Login before adding your answer.

Traffic: 632 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6