Hi all, my apologies if this question is trivial or if I'm misunderstanding something fundamental to biomaRt. I'm having difficulty figuring out how to query only SNPs, and not indels, somatic variations, etc., for a specific gene/transcript in the biomaRt package. On the Ensembl website, the Variation Table allows you to filter by 'Class' and lets you select SNPs, but I've been unsuccessful in finding something similar in the 'filters' or 'attributes' in the biomaRt package.
For example, when I query the variations within the very short human gene RNY3 (transcript is 102 bp) with:
ensembl_snp <- useMart("ENSEMBL_MART_SNP")
dataset <- useDataset("hsapiens_snp", mart = ensembl_snp)
snps <- getBM(
attributes = c("refsnp_id", "chrom_start", "chrom_end", "chr_name", "allele"),
filters = "ensembl_gene",
values = "ENSG00000202354",
mart = dataset
)
print(snps)
I receive 94 variations which I can tell are a mix of different 'Classes' due to their 'allele':
refsnp_id chrom_start chrom_end chr_name allele
1 rs146500940 148680859 148680859 7 T/C/G
2 rs188517771 148680903 148680903 7 T/C/G
...
74 rs1468891485 148680901 148680903 7 TTT/TTTT
...
81 rs1823259365 148680850 148680850 7 T/-
I'm also not sure about filtering 'alleles' that look like SNPs because I'm nervous that they are somatic SNVs.
If someone knows a good way to solve this I would be very grateful!
When you say 'Variation table' are you describing something you see in the Biomart interface, or an entirely different part of the website? The
biomaRt
package is simply an interface to the Biomart server, and it doesn't do generalized queries to Ensembl. If you can't filter using Biomart directly, I don't believe you can do so usingbiomaRt
, and I don't see anything in the Biomart interface that filters by allele class. But maybe I am missing something obvious?Thanks for the reply! When I said 'Variation Table' I meant the feature from the Ensembl website, but thanks to your explanation I understand now that a similar feature is probably not present in biomaRt. Do you have any recommendations on a different package that would return only SNPs for a given gene/transcript/genome range?