quasR - how are exon identifiers for exon counts determined
2
0
Entering edit mode
@svenschuierer-7087
Last seen 7.5 years ago
Switzerland

Hi,

I have a question concerning the exon identifiers that are used in quasR for the exon counts.I use a GTF file to create a transcript DB:

gtfFile <- file.path(project.dir, "exon-pipeline-files", "gtf-files", "ensembl_rna_hs.gtf")

chrLen <- scanFaIndex(genomeFile)
chrominfo <- data.frame(chrom=as.character(seqnames(chrLen)),
                        length=width(chrLen),
                        is_circular=rep(FALSE, length(chrLen)))

txdb <- makeTranscriptDbFromGFF(file=gtfFile, format="gtf",
                                exonRankAttributeName="exon_number", 
                                gffGeneIdAttributeName="gene_name",
                                chrominfo=chrominfo,
                                dataSource="Ensembl",
                                species="Homo sapiens")

which looks like:

1 Ensembl exon 11869 12227 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002234944"; exon_number "1";
1 Ensembl exon 12613 12721 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00003582793"; exon_number "2";
1 Ensembl exon 13221 14409 0.0 + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; transcript_name "DDX11L1-002"; exon_id "ENSE00002312635"; exon_number "3";

When I do the quantification:

exonLevels <- qCount(proj, txdb, reportLevel="exon")

I get numbers as exon identifiers:

> head(exonLevels)
       width Sample1 Sample2 Sample3 Sample4
1        110               0               0               0               0
10       124               0               0               0               0
100      613              59              46              63              45
1000     256              49              41              70              56
10000    119               0               0               0               0
100000   223               0               0               0               0


How do relate these exon identifiers back to the GTF entries?

Best regards,

Sven

rnaseq exon quasr • 1.7k views
ADD COMMENT
0
Entering edit mode
@hotz-hans-rudolf-3951
Last seen 3.5 years ago
Switzerland

Hi Sven

You can merge the contents of your txdb with the count table, eg:

ids <- as.character(row.names(exonLevels))

annot <- select(txdb, ids, columns=c("EXONSTART","EXONEND","EXONCHROM","EXONSTRAND"), keytype="EXONID")

# you might wanna first check the columns of your txdb, with 'keytypes(txdb)'

 

exonLevelsWithAnnot <- merge(annot,exonLevels,by.x="EXONID",by.y=0)

 

Hope this helps, Hans-Rudolf

 

 

ADD COMMENT
0
Entering edit mode
@svenschuierer-7087
Last seen 7.5 years ago
Switzerland

Hi Hans-Rudolf,

thanks a lot for your answer. Yes, I can get it to work using this approach. Actually, since I do not have EXONSTART or EXONCHROM:

> keytypes(txdb)
[1] "GENEID"   "TXID"     "TXNAME"   "EXONID"   "EXONNAME" "CDSID"    "CDSNAME"

I do the following:

exon.ranges <- exons(txdb, columns=c("EXONID"))
exon.frame <- data.frame(seqnames(exon.ranges), ranges(exon.ranges), strand(exon.ranges), mcols(exon.ranges)[[1]])
names(exon.frame) <- c("Chromosome", "Start", "End", "Length", "Strand", "Row Id")
exon.frame[["Exon Id"]] <- paste(exon.frame[["Chromosome"]], paste(exon.frame[["Start"]], exon.frame[["End"]], sep="-"), exon.frame[["Strand"]], sep=":")

and

exonLevels.ind   <- match(row.names(exonLevels), exon.frame[["Row Id"]], nomatch=0)
exon.count.frame <- data.frame(exon.frame[["Exon Id"]][exonLevels.ind], exonLevels[,-1], check.names=FALSE)


which is a good enough workaround for me.

So thanks a lot for helping me here (and for your last answer as well).

Best regards,

Sven

 

ADD COMMENT

Login before adding your answer.

Traffic: 615 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6