Question

comparing strand-specific counts to unstranded counts

0

Entering edit mode

lirongrossmann ▴ 80

@lirongrossmann-13938

Last seen 3.5 years ago

Hi All,

I am trying to compare the level of gene expression between strand specific samples and unstranded samples of rna seq for a set of selected genes.

I ran into a post in biostar :

https://www.biostars.org/p/73205/

I wanted to confirm with "Deseq2 experts" here - is this the best way to compare strand vs unstranded counts or is there another recommended way to do it using Deseq2?

Thanks a lot

rnaseq strand deseq2 • 2.4k views

ADD COMMENT • link 6.5 years ago lirongrossmann ▴ 80

0

Entering edit mode

Thanks! Does Deseq2 treat the batch effect as a linear covariate or does it take into account non linear variations that the batch may include? ( I.e other than an offset)

ADD REPLY • link 6.5 years ago lirongrossmann ▴ 80

0

Entering edit mode

When you include batch as a new variable in the design, you are making the beta vector for gene i longer, by adding coefficients to the model. See the third formula here (where beta_i is a vector of coefficients):

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#the-deseq2-model

So, for a concrete example, you would get beta_i = [beta_{i,intercept}, beta_{i,batch2}, beta_{i,condition2}] if you had a design of ~batch + condition, and batch and condition each had two levels. Note that this allows the batch effect to differ across genes: for each gene, the batch effect will be fit by the GLM. Of course this requires that batch and condition are not confounded, and if they are, the software prints and error and will not run.

ADD REPLY • link 6.5 years ago Michael Love 41k

0

Entering edit mode

Thanks!

So basically it accounts for the batch in a linear fashion if it uses GLM.

ADD REPLY • link 6.5 years ago lirongrossmann ▴ 80

1

Entering edit mode

I could summarize the above to say: you get a fold change for each gene which explains differences in counts associated with batch. If you have n batches you get n-1 fold changes to explain differences among batches.

ADD REPLY • link 6.5 years ago Michael Love 41k

0

Entering edit mode

Thank you.

I think that what confuses me is how to interpret the overall result of which gene can be a good candidate to discriminate between the two groups I have from a biologic standpoint.

To be more specific - I have two groups, each has 10 samples. One group responded to therapy the other did not. The one that did not respond had 4 samples which are stranded, and 6 which are unstranded. The group that responded had 10 samples which are stranded.

I used the following code:

dds <-DESeqDataSetFromMatrix(countData = ep,colData = cp,design =~Strand+Response)

dds <- estimateSizeFactors(dds)

dds <- DESeq(dds)

res<- results(dds)
resbatch <- results(dds, contrast = c("Strand","StrandSpecific","Unstranded"))

Now for a specific gene (say X), I got that the fold change is 3.6 (adjp = 0.0003) for the response variable and for the same gene fold change of 6.5 (adp=0.0021) for the batch variable.

Does that mean that I can trust gene X to be a good candidate for discriminating between the two groups? (He is discriminating the response but a lot of difference between the expression of the genes across the groups is explained by the batch).

What would be the most rigorous way to choose the genes that discriminate the two groups given the batch effect between them?

Thanks a lot!

ADD REPLY • link 6.5 years ago lirongrossmann ▴ 80

1

Entering edit mode

Using a statistical model to fit coefficients and associated standard errors is the rigorous approach. If you want to focus on large effects you can also use lfcThreshold in results().

ADD REPLY • link 6.5 years ago Michael Love 41k

0

Entering edit mode

thank you!

My goal is to choose the genes that best discriminate my two groups in terms of their response to the treatment and use them for downstream analysis.

Since I have a batch effect confounding the biological effect (response to treatment) I am not sure which genes to choose based on the fold changes. So for example, if I didn’t have a batch effect, I would choose the genes with the smallest p value and largest fold change. Now that I added the batch to the model, what criteria would you recommend to choose the genes for downstream analysis to best discriminate the two on the basis of their response to treatment?

ADD REPLY • link 6.5 years ago lirongrossmann ▴ 80

score 0 · Answer 1 · 2017-10-25

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 3 hours ago

United States

Take a look at the DESeq2 vignette, there is a similar example:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#multi-factor-designs

ADD COMMENT • link 6.5 years ago Michael Love 41k