Hello Bioconductor,

I'm working on RNA and small RNA seq and I'm a bit confused regarding the version that I can use between the alignment of fastqs and R packages. The trouble starts when I try to make my own GTF file for small RNA sequences that exist in other databases and I don't want to mix up sequences from one version, genomic ranges from another and transcripts from another one.

I make a fast file from a small RNA database that has coordinates of the old genome version hg18. I use the GENCODE release_34 GRCh38 of primary assembly fasta for the alignment. I get the sequences that have at least one alignment to the genome. Then for easier manipulation, I create a bed from bam to import the genomic ranges of my alignments in R and work with them. And this is the first file I want to merge with the next one.

Following, I use another BED file that has small RNA sequences from multiple sources and I perform:


small_RNAs_bed  <- read_bed("small_RNAs_DB.bed")%>%
 keepStandardChromosomes(pruning.mode = "coarse") 

sInfo <- Seqinfo(genome="hg38")
seqlevels(small_RNAs_bed  ) <- seqlevels(small_RNAs_bed  )
seqinfo(small_RNAs_bed) <- sInfo  

As I want to also check the sequences of small RNAs from that BED I extract them from:

transcripts_human <- Views(BSgenome.Hsapiens.UCSC.hg38, small_RNAs_bed)  

So, are these sequences that I get the same with the ones of GENCODE release_34??

What's more, are the TxDb.Hsapiens.UCSC.hg38.knownGene the same as the one that exist in the GTF file of GENCODE primary assembly? BSgenome.Hsapiens.UCSC.hg38 • 664 views

