When I use summarizeVariants with a GRangesList of genes, the genes are then reordered in the RangedSummarizedExperiment object and the assay(counts) values are incorrect. Is this a bug or can I control this reordering?
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
genes <- genes(txdb)
# get the gene symbol for corresponding ENTREZID
gene.map <- select(org.Hs.eg.db, keys=elementMetadata(genes)$gene_id, keytype="ENTREZID", column="SYMBOL")
genes.df <- merge(genes, gene.map, by.x="gene_id", by.y="ENTREZID")
elementMetadata(genes)$SYMBOL <- genes.df$SYMBOL
# get the gene symbols GRanges and push into a GRangesList
grl <- split(genes, elementMetadata(genes)$SYMBOL)
# subset by a few genes
grl <- grl[c('STK11','TP53','GCLM','SLC6A3')]
print(grl)
GRangesList object of length 4:
$STK11
GRanges object with 1 range and 2 metadata columns:
seqnames ranges strand | gene_id SYMBOL
<Rle> <IRanges> <Rle> | <character> <character>
6794 chr19 [1205798, 1228434] + | 6794 STK11
$TP53
GRanges object with 1 range and 2 metadata columns:
seqnames ranges strand | gene_id SYMBOL
7157 chr17 [7565097, 7590868] - | 7157 TP53
$GCLM
GRanges object with 1 range and 2 metadata columns:
seqnames ranges strand | gene_id SYMBOL
2730 chr1 [94352590, 94375012] - | 2730 GCLM
...
<1 more element>
-------
# read in the vcf file
print("Read filtered vcf...")
vcf <- readVcf(vcf.file, "hg19", row.names=TRUE)
# change sample headers
header(vcf)@samples <- sub("(.*)_S.*", "\\1", header(vcf)@samples)
rownames(colData(vcf)) <- sub("(.*)_S.*", "\\1", rownames(colData(vcf)))
# summarize variant counts for coding regions (this will take a long time)
print("Summarize variant counts...")
x <- summarizeVariants(grl, vcf, CodingVariants())
print(x)
seqinfo: 93 sequences (1 circular) from hg19 genome
[1] "Summarize variant counts..."
class: RangedSummarizedExperiment
dim: 4 118
metadata(1): header
assays(1): counts
rownames(4): GCLM TP53 STK11 SLC6A3
rowData names(0):
colnames(118): D02914TB04 D06699TB04_2nd ... D31681TB02 D33818TB02
colData names(1): Samples
print(assays(x)$counts)
