derfinder on bacteria PE strand specific RNA-seq
2
1
Entering edit mode
eva.pinatel ▴ 10
@evapinatel-7358
Last seen 3.8 years ago
Italy

Hello,

I'm trying to analyze a set of bacterial, paired-end, strand specific RNA-seq with derfinder (if it is possible). I see some major points I would like to have some suggestions about:

1)My data are strand specific and derfinder doesn't seem to consider this option, it is correct to perform two analysis providing separately plus and minus strand bigwig files?

2)Dealing with bacteria paired-end data (no exons), I reconstruct the original insert from properly paired reads, to fully take advantage of the produced data. As consequence, my bigwig files originate from fragments from 75bp to 500bp in length with a median of 300 bp. It should be fine indicating L=500 in regionMatrix function or the different distribution of fragment lengths could give some problem?

3)There is a work around to work with bacteria annotation files?

I obtained a txdb form the Refseq GFF  but this bacterium is not included in GenomeInfoDb, and I dont'know if the txdb is correctly formatted to permit the following step of the analysis  as exons and mRNA features are not indicated in bacteria (only genes, CDS,rRNA,tRNA and pseudogenes are generally present).

Thank you in advance

Eva

derfinder • 1.5k views
ADD COMMENT
0
Entering edit mode
@lcolladotor
Last seen 6 weeks ago
United States

Hi Eva,

Thank you for your great set of questions!

1) If you have strand specific bigwig files, you can run one analysis for each strand at a time. If you are loading data from BAM files, you can pass arguments specific to Rsamtools::scanBamFlag() in derfinder::loadCoverage() and derfinder::fullCoverage().

2) Since you have reads of different sizes, instead of using the number of reads to adjust the library size, we recommend using the AUC of the coverage vector. That's what we did for recount since Rail-RNA soft clips reads and we end up with reads of different lengths. You can calculate the AUC from a bigWig file using derfinder::getTotalMapped() or using bwtool (external to R). If you do so, then you can set `totalMapped` to the AUC and `targetSize` to the average library size of interest. In recount we set it to a library of 40 million 100 bp reads as shown at https://github.com/leekgroup/recount/blob/master/R/coverage_matrix.R#L126-L127. Then set `L` to 1.

3) For the annotation side, we use bumphunter::annotateTranscripts() and bumphunter::matchGenes(). Those functions should in theory work for any txdb object. But well, if it doesn't please provide code for a small reproducible example. For derfinder::makeGenomicState(), it's likely that it won't work in it's current state (since it looks for introns). Again, if you give me an example I'll see what I can do.

Best,

Leonardo

ADD COMMENT
0
Entering edit mode
eva.pinatel ▴ 10
@evapinatel-7358
Last seen 3.8 years ago
Italy

Hi Leonardo,

thank you for the quick reply. I' will try to go on with the analysis and let you know if I have problems with annotation files.

Best regards

Eva

ADD COMMENT

Login before adding your answer.

Traffic: 538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6