Hello,
I am working with some paired-end data, and based on the fastQC reports the R2 reads have pretty bad GC, frag length, and base-pair results. I am worried that using these R2s will add bias related to the issues above. First, I tried just using the R1s with RSEM, but got extremely poor alignments (<1%). I then cleaned the reads of rRNA (as the data was total RNA), and used those R1s in RSEM again. This did not improve alignments. I am currently re-running RSEM using the both R1 and R2, but I am concerned about the poor quality of the R2s. Now these are prelim data which came from barely sequencable samples to begin with (RIN < 3 for most samples), and the results likely won't be used for anything but generating a candidate list for grant writing; and I am aware that with the biases above there is a good chance that many of the genes in that candidate list will be false positives. I am curious about whether or not it is appropriate to use alpine in this instance to correct those biases. And if so, it looks like alpine outputs FPKM estimates which means I can no longer use DESeq2. Is the best course of action then to use limma? Or is there a way I can get back to expected counts from the alpine output?
Thank you
Harry