Question

derfinder on bacteria PE strand specific RNA-seq

1

Entering edit mode

eva.pinatel ▴ 10

@evapinatel-7358

Last seen 3.2 years ago

Italy

Hello,

I'm trying to analyze a set of bacterial, paired-end, strand specific RNA-seq with derfinder (if it is possible). I see some major points I would like to have some suggestions about:

1)My data are strand specific and derfinder doesn't seem to consider this option, it is correct to perform two analysis providing separately plus and minus strand bigwig files?

2)Dealing with bacteria paired-end data (no exons), I reconstruct the original insert from properly paired reads, to fully take advantage of the produced data. As consequence, my bigwig files originate from fragments from 75bp to 500bp in length with a median of 300 bp. It should be fine indicating L=500 in regionMatrix function or the different distribution of fragment lengths could give some problem?

3)There is a work around to work with bacteria annotation files?

I obtained a txdb form the Refseq GFF but this bacterium is not included in GenomeInfoDb, and I dont'know if the txdb is correctly formatted to permit the following step of the analysis as exons and mRNA features are not indicated in bacteria (only genes, CDS,rRNA,tRNA and pseudogenes are generally present).

Thank you in advance

Eva

derfinder • 1.2k views

ADD COMMENT • link 7.4 years ago eva.pinatel ▴ 10

score 0 · Answer 1 · 2016-12-01

Hi Eva,

Thank you for your great set of questions!

1) If you have strand specific bigwig files, you can run one analysis for each strand at a time. If you are loading data from BAM files, you can pass arguments specific to Rsamtools::scanBamFlag() in derfinder::loadCoverage() and derfinder::fullCoverage().

2) Since you have reads of different sizes, instead of using the number of reads to adjust the library size, we recommend using the AUC of the coverage vector. That's what we did for recount since Rail-RNA soft clips reads and we end up with reads of different lengths. You can calculate the AUC from a bigWig file using derfinder::getTotalMapped() or using bwtool (external to R). If you do so, then you can set `totalMapped` to the AUC and `targetSize` to the average library size of interest. In recount we set it to a library of 40 million 100 bp reads as shown at https://github.com/leekgroup/recount/blob/master/R/coverage_matrix.R#L126-L127. Then set `L` to 1.

3) For the annotation side, we use bumphunter::annotateTranscripts() and bumphunter::matchGenes(). Those functions should in theory work for any txdb object. But well, if it doesn't please provide code for a small reproducible example. For derfinder::makeGenomicState(), it's likely that it won't work in it's current state (since it looks for introns). Again, if you give me an example I'll see what I can do.

Best,

Leonardo

score 0 · Answer 2 · 2016-12-01

0

Entering edit mode

eva.pinatel ▴ 10

@evapinatel-7358

Last seen 3.2 years ago

Italy

Hi Leonardo,

thank you for the quick reply. I' will try to go on with the analysis and let you know if I have problems with annotation files.

Best regards

Eva

ADD COMMENT • link 7.4 years ago eva.pinatel ▴ 10