Question: comparing strand-specific counts to unstranded counts
gravatar for lirongrossmann
11 months ago by
lirongrossmann10 wrote:

Hi All,

I am trying to compare the level of gene expression between strand specific samples and unstranded samples of rna seq for a set of selected genes. 

I ran into a post in biostar :

I wanted to confirm with "Deseq2 experts" here - is this the best way to compare strand vs unstranded counts or is there another recommended way to do it using Deseq2?

Thanks a lot

ADD COMMENTlink modified 11 months ago • written 11 months ago by lirongrossmann10

Thanks! Does Deseq2 treat the batch effect as a linear covariate or does it take into account non linear variations that the batch may include? ( I.e other than an offset)

ADD REPLYlink written 11 months ago by lirongrossmann10

When you include batch as a new variable in the design, you are making the beta vector for gene i longer, by adding coefficients to the model. See the third formula here (where beta_i is a vector of coefficients):

So, for a concrete example, you would get beta_i = [beta_{i,intercept}, beta_{i,batch2}, beta_{i,condition2}] if you had a design of ~batch + condition, and batch and condition each had two levels. Note that this allows the batch effect to differ across genes: for each gene, the batch effect will be fit by the GLM. Of course this requires that batch and condition are not confounded, and if they are, the software prints and error and will not run.

ADD REPLYlink written 11 months ago by Michael Love19k


So basically it accounts for the batch in a linear fashion if it uses GLM.


ADD REPLYlink written 11 months ago by lirongrossmann10

I could summarize the above to say: you get a fold change for each gene which explains differences in counts associated with batch. If you have n batches you get n-1 fold changes to explain differences among batches.

ADD REPLYlink written 11 months ago by Michael Love19k

Thank you.

I think that what confuses me is how to interpret the overall result of which gene can be a good candidate to discriminate between the two groups I have from a biologic standpoint. 

To be more specific - I have two groups, each has 10 samples. One group responded to therapy the other did not. The one that did not respond had 4 samples which are stranded, and 6 which are unstranded. The group that responded had 10 samples which are stranded. 

I used the following code:

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Strand+Response)

dds <- estimateSizeFactors(dds)

dds <- DESeq(dds)

res<- results(dds)
resbatch <- results(dds, contrast = c("Strand","StrandSpecific","Unstranded"))

Now for a specific gene (say X), I got that the fold change is 3.6 (adjp = 0.0003) for the response variable and for the same gene fold change of 6.5 (adp=0.0021) for the batch variable.

Does that mean that I can trust gene X to be a good candidate for discriminating between the two groups? (He is discriminating the response but a lot of difference between the expression of the genes across the groups is explained by the batch).

What would be the most rigorous way to choose the genes that discriminate the two groups given the batch effect between them?

Thanks a lot!


ADD REPLYlink written 11 months ago by lirongrossmann10

Using a statistical model to fit coefficients and associated standard errors is the rigorous approach. If you want to focus on large effects you can also use lfcThreshold in results().

ADD REPLYlink written 11 months ago by Michael Love19k

thank you! 

My goal is to choose the genes that best discriminate my two groups in terms of their response to the treatment and use them for downstream analysis. 

Since I have a batch effect confounding the biological effect (response to treatment) I am not sure which genes to choose based on the fold changes. So for example, if I didn’t have a batch effect, I would choose the genes with the smallest p value and largest fold change. Now that I added the batch to the model, what criteria would you recommend to choose the genes for downstream analysis to best discriminate the two on the basis of their response to treatment?



ADD REPLYlink written 11 months ago by lirongrossmann10
gravatar for Michael Love
11 months ago by
Michael Love19k
United States
Michael Love19k wrote:

Take a look at the DESeq2 vignette, there is a similar example:

ADD COMMENTlink written 11 months ago by Michael Love19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 336 users visited in the last hour