Question

Should I use alpine and can I convert the output back to counts?

0

Entering edit mode

harry.smith ▴ 20

@harrysmith-14165

Last seen 6.0 years ago

Hello,

I am working with some paired-end data, and based on the fastQC reports the R2 reads have pretty bad GC, frag length, and base-pair results. I am worried that using these R2s will add bias related to the issues above. First, I tried just using the R1s with RSEM, but got extremely poor alignments (<1%). I then cleaned the reads of rRNA (as the data was total RNA), and used those R1s in RSEM again. This did not improve alignments. I am currently re-running RSEM using the both R1 and R2, but I am concerned about the poor quality of the R2s. Now these are prelim data which came from barely sequencable samples to begin with (RIN < 3 for most samples), and the results likely won't be used for anything but generating a candidate list for grant writing; and I am aware that with the biases above there is a good chance that many of the genes in that candidate list will be false positives. I am curious about whether or not it is appropriate to use alpine in this instance to correct those biases. And if so, it looks like alpine outputs FPKM estimates which means I can no longer use DESeq2. Is the best course of action then to use limma? Or is there a way I can get back to expected counts from the alpine output?

Thank you

Harry

alpine deseq2 • 2.6k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 43k • written 6.4 years ago by harry.smith ▴ 20

score 2 · Accepted Answer · 2018-07-27

I would recommend use of Salmon, which includes the GC bias correction of alpine if you use the --gcBias argument. Salmon is a much better software for making quantifications than alpine. alpine was designed for research into RNA-seq bias, and comparing various bias models. It provides abundance, but Salmon is better for a number of reasons. For more details see this presentation:

https://goo.gl/ftK55e