Bias correction in single end experiment for DEG and DET
2
0
Entering edit mode
@yohannnedelec-10940
Last seen 8.5 years ago

Hello,

I'd like to get some advice about analyses I'd like to improve.

I'm concerned about bias (GC in particular) when comparing transcripts and gene expressions between groups of samples.

My objective are:

  1. Identify DE genes and DE transcripts
  2. Eliminate some bias before doing eQTL and sQTL mapping

About my data: 80 libraries in each of the two groups, ~30M reads in single end

Currently, I directly use the output from RSEM and pipe it to Voom to correct for known batch effects between samples (mainly flowcells effects).

Could you please point me to a better direction than this ?
Should I apply tximport before ?
Would you method, Alpine, work in my case (can it work with single end)?

Thank you for your help,
Regards,

tximport alpine • 1.4k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 1 day ago
United States

hi Yohann,

For GC content bias on the gene level, you can use the Bioconductor packages cqn or EDASeq and then any of the downstream statistical packages (DESeq2, edgeR, limma, etc). I believe for both packages, you can obtain the offset matrix for statistical analysis (don't know if your eQTL pipelines can accept offsets, but this is a simple thing for a linear model to accommodate), or you can get a normalized bias-corrected matrix for EDA.

I believe you could also use cqn and EDASeq with estimated transcript counts.

Now, your RSEM to limma-voom pipeline may be perfectly fine as is and you don't have to use the above tools, if it is the case that the GC dependence is explained mostly by batch terms. You can figure this out by running cqn or EDASeq, making the GC dependence plot, and coloring lines by batch. If nearly all the variation is across batch and not within batch, then I wouldn't change your current pipeline.

You can use tximport, but this is really a convenience function for reading in transcript quantifications and summarizing to the gene level. RSEM does this itself already.

alpine doesn't support single end yet. I hope to spend more time expanding the features and adding more documentation later this year (and adding to Bioconductor).

ADD COMMENT
0
Entering edit mode
@yohannnedelec-10940
Last seen 8.5 years ago

Thanks a lot for your answer Michael,

About correcting for length and GC content biases at the transcript level, my understanding is that I first have to calculate the GC content and length of each transcript and feed that info to EDAseq.
Am I correct with this approach or are there some caveats that I'm missing ?

ADD COMMENT
0
Entering edit mode

hi Yohann, 

(quick note about the site, you can add Comments/Replies to thread a conversation instead of Answers which are for answering the original posted question)

Yes you would calculate GC content and length and feed these to EDASeq or cqn. Pointers for doing this are: extractTranscriptSeqs in the GenomicFeatures package and sum(width(grl)) if you have a GRangesList of the exons per transcript. But if you have further package specific questions, you can make a new post and get the advice of the package authors by tagging the post.

ADD REPLY

Login before adding your answer.

Traffic: 383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6