Currently, I'm working in piRNA expression in different cell lines and I would like to ask you about the way I can proceed with data transformation and normalization.
Now, the main issue is that in order to enrich for piRNAs we performed periodate treatment that "The PO treatment
has been shown to be effective in separating piRNAs from other classes of small RNAs and degradation products of longer mRNA transcripts studies"
We have treated libraries with ~10 million reads and untreated with ~45 million reads.
In order to find piRNAs in our samples, we used SPORTS1.0
with output: matched reads to the genome and matched reads to databases, unmatched reads to the genome and matched reads to databases.
For every database regarding different small RNA (rRNA, tRNA, piRNA, lncRNA ....) we get a file with the particular reads matched to that database. (So, we have many resulting files, in a tabular format:)
The majority of reads multimap to different piRNAs, so I took the sum of reads assigned to each piRNA (both unmatched/matched reads to the genome).
So the library is separated for every smallRNA database.
How will I perform normalization between libraries with so many quantitative differences so as to check for relative expression?