Search
Question: comparing strand-specific counts to unstranded counts
0
gravatar for lirongrossmann
23 days ago by
lirongrossmann0 wrote:

Hi All,

I am trying to compare the level of gene expression between strand specific samples and unstranded samples of rna seq for a set of selected genes. 

I ran into a post in biostar :

https://www.biostars.org/p/73205/

I wanted to confirm with "Deseq2 experts" here - is this the best way to compare strand vs unstranded counts or is there another recommended way to do it using Deseq2?

Thanks a lot

ADD COMMENTlink modified 21 days ago • written 23 days ago by lirongrossmann0

Thanks! Does Deseq2 treat the batch effect as a linear covariate or does it take into account non linear variations that the batch may include? ( I.e other than an offset)

ADD REPLYlink written 21 days ago by lirongrossmann0

When you include batch as a new variable in the design, you are making the beta vector for gene i longer, by adding coefficients to the model. See the third formula here (where beta_i is a vector of coefficients):

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#the-deseq2-model

So, for a concrete example, you would get beta_i = [beta_{i,intercept}, beta_{i,batch2}, beta_{i,condition2}] if you had a design of ~batch + condition, and batch and condition each had two levels. Note that this allows the batch effect to differ across genes: for each gene, the batch effect will be fit by the GLM. Of course this requires that batch and condition are not confounded, and if they are, the software prints and error and will not run.

ADD REPLYlink written 21 days ago by Michael Love14k

 Thanks! 

So basically it accounts for the batch in a linear fashion if it uses GLM.

 

ADD REPLYlink written 21 days ago by lirongrossmann0
1

I could summarize the above to say: you get a fold change for each gene which explains differences in counts associated with batch. If you have n batches you get n-1 fold changes to explain differences among batches.

ADD REPLYlink written 21 days ago by Michael Love14k

Thank you.

I think that what confuses me is how to interpret the overall result of which gene can be a good candidate to discriminate between the two groups I have from a biologic standpoint. 

To be more specific - I have two groups, each has 10 samples. One group responded to therapy the other did not. The one that did not respond had 4 samples which are stranded, and 6 which are unstranded. The group that responded had 10 samples which are stranded. 

I used the following code:

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Strand+Response)

dds <- estimateSizeFactors(dds)

dds <- DESeq(dds)

res<- results(dds)
resbatch <- results(dds, contrast = c("Strand","StrandSpecific","Unstranded"))
 

Now for a specific gene (say X), I got that the fold change is 3.6 (adjp = 0.0003) for the response variable and for the same gene fold change of 6.5 (adp=0.0021) for the batch variable.

Does that mean that I can trust gene X to be a good candidate for discriminating between the two groups? (He is discriminating the response but a lot of difference between the expression of the genes across the groups is explained by the batch).

What would be the most rigorous way to choose the genes that discriminate the two groups given the batch effect between them?

Thanks a lot!

 

ADD REPLYlink written 20 days ago by lirongrossmann0
1

Using a statistical model to fit coefficients and associated standard errors is the rigorous approach. If you want to focus on large effects you can also use lfcThreshold in results().

ADD REPLYlink written 20 days ago by Michael Love14k

thank you! 

My goal is to choose the genes that best discriminate my two groups in terms of their response to the treatment and use them for downstream analysis. 

Since I have a batch effect confounding the biological effect (response to treatment) I am not sure which genes to choose based on the fold changes. So for example, if I didn’t have a batch effect, I would choose the genes with the smallest p value and largest fold change. Now that I added the batch to the model, what criteria would you recommend to choose the genes for downstream analysis to best discriminate the two on the basis of their response to treatment?

 

 

ADD REPLYlink written 20 days ago by lirongrossmann0
0
gravatar for Michael Love
23 days ago by
Michael Love14k
United States
Michael Love14k wrote:

Take a look at the DESeq2 vignette, there is a similar example:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs

ADD COMMENTlink written 23 days ago by Michael Love14k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 258 users visited in the last hour