Hi @jabbar_campbell
The identifiers returned by qCount (rownames of the matrix) are taken from your query, so the solution will be to make sure your query contains the correct gene identifiers or names.
It would be helpful if you provide a reproducible example, in this case e.g. I don't know what the source of the gtf file was, and how txdb2 was
constructed. My guess is that you are indeed not using a TxDb object, but rather a GRanges object or similar (TxDb objects cannot be subset as you do with txdb2 in your example).
The following code for example uses a gtf file from Human Gencode Release 34 and will give you Ensembl gene identifiers (make sure your gene annotation is consistent with your reference genome):
library(QuasR)
gtffile <-
"ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.basic.annotation.gtf.gz"
txdb <- makeTxDbFromGFF(gtffile, format = "gtf")
genes(txdb)
The txdb
object can then be used directly with qCount, or you can extract a GRanges object with exon regions to quantify genes. In the latter case, the returned identifiers are the unique names of the
GRanges object. These two should give identical results (see example below), but the second gives you more flexibility (e.g. to filter out genes without known
identifier).
Here is a complete working example based on the example genome, reads and annotation that is part of the QuasR package:
library(QuasR)
# copy example data to temporary directory
td <- tempdir()
file.copy(system.file(package="QuasR", "extdata"), td, recursive=TRUE)
# get gene annotation (Ensembl identifiers)
gtffile <- file.path(td, "extdata", "hg19sub_annotation.gtf")
txdb <- makeTxDbFromGFF(gtffile, format = "gtf")
genes(txdb)
# align reads
proj <- qAlign(file.path(td, "extdata", "samples_rna_single.txt"),
file.path(td, "extdata", "hg19sub.fa"))
# count reads per gene
# ... using TxDb
cnt1 <- qCount(proj, txdb)
cnt1
# ... using exons GRanges extracted from TxDb
query <- unlist(exonsBy(txdb, by = "gene"))
cnt2 <- qCount(proj, query)
cnt2
# ... which both give you identical results (Ensemble identifiers)
identical(cnt1, cnt2)
I hope that clarifies it. The above example and much more is also available from the vignette and function documentations.
Michael