Hi,
I have a question concerning the exon identifiers that are used in quasR for the exon counts.I use a GTF file to create a transcript DB:
gtfFile <- file.path(project.dir, "exon-pipeline-files", "gtf-files", "ensembl_rna_hs.gtf")
chrLen <- scanFaIndex(genomeFile)
chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),
length=width(chrLen),
is_circular=rep(FALSE, length(chrLen)))
txdb <- makeTranscriptDbFromGFF(file=gtfFile, format="gtf",
exonRankAttributeName="exon_number",
gffGeneIdAttributeName="gene_name",
chrominfo=chrominfo,
dataSource="Ensembl",
species="Homo sapiens")
which looks like:
1 Ensembl exon 11869 12227 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002234944"; exon_number "1"; 1 Ensembl exon 12613 12721 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00003582793"; exon_number "2"; 1 Ensembl exon 13221 14409 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002312635"; exon_number "3";
When I do the quantification:
exonLevels <- qCount(proj, txdb, reportLevel="exon")
I get numbers as exon identifiers:
> head(exonLevels)
width Sample1 Sample2 Sample3 Sample4
1 110 0 0 0 0
10 124 0 0 0 0
100 613 59 46 63 45
1000 256 49 41 70 56
10000 119 0 0 0 0
100000 223 0 0 0 0
How do relate these exon identifiers back to the GTF entries?
Best regards,
Sven
