Hi,
I have a question concerning the exon identifiers that are used in quasR for the exon counts.I use a GTF file to create a transcript DB:
gtfFile <- file.path(project.dir, "exon-pipeline-files", "gtf-files", "ensembl_rna_hs.gtf") chrLen <- scanFaIndex(genomeFile) chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)), length=width(chrLen), is_circular=rep(FALSE, length(chrLen))) txdb <- makeTranscriptDbFromGFF(file=gtfFile, format="gtf", exonRankAttributeName="exon_number", gffGeneIdAttributeName="gene_name", chrominfo=chrominfo, dataSource="Ensembl", species="Homo sapiens")
which looks like:
1 Ensembl exon 11869 12227 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002234944"; exon_number "1"; 1 Ensembl exon 12613 12721 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00003582793"; exon_number "2"; 1 Ensembl exon 13221 14409 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002312635"; exon_number "3";
When I do the quantification:
exonLevels <- qCount(proj, txdb, reportLevel="exon")
I get numbers as exon identifiers:
> head(exonLevels) width Sample1 Sample2 Sample3 Sample4 1 110 0 0 0 0 10 124 0 0 0 0 100 613 59 46 63 45 1000 256 49 41 70 56 10000 119 0 0 0 0 100000 223 0 0 0 0
How do relate these exon identifiers back to the GTF entries?
Best regards,
Sven