I have a general question and what seems to be a bug or something weird.
My general goal would be:
given a set of genomic coordinates get the genomic DNA sequence, regardless if the element is an exon, intro, coding region, enhancer, whatever.
I don't need to get flanking regions upstream or downstream
I'd like to specify the genome assembly (e.g. hg38, hg19, mm10, or mm9)
I fear that
biomaRt actually cannot do that, but I'd be awesome if I were to be proven wrong.
For example, let's get 20bp of an intergenic region, that could be an exon or an intron. The coordinates are: chrX:100636100-100636120
biomaRt::getSequence(chromosome = "x", start = 100636100, end = 100636120, seqType = 'cdna', type = 'ensembl_gene_id', mart = ensembl, verbose = F) -> tmp
seqType I specified
cdna as I read in the manual that that would return a nucleotide sequence. For the
type argument I selected
ensembl_gene_id just because I was forced to pick one ID. I would expect such query to return a
dataframe with only one row containing the 20bp nucleotide sequence.
dim(tmp)  5 2 nchar(tmp$cdna)  3768 3796 1025 820 900
Meaning that I get 5 rows each with DNA sequences of different lenght.
Now is there a way to get only the correct genomic DNA sequence?
I hoped that with the
seqType argument one could get such thing, so while playing around and when using
biomaRt::getSequence(chromosome = 'x', start = 100636100, end = 100636120, seqType = 'coding_gene_flank', type = 'ensembl_gene_id', mart = ensembl, verbose = F)
and I came across this error message.
Error in .processResults(postRes, mart = mart, sep = sep, fullXmlQuery = fullXmlQuery, : The query to the BioMart webservice returned an invalid result: the number of columns in the result table does not equal the number of attributes in the query. Please report this on the support site at http://support.bioconductor.org
So, as requested here I am.
biomaRt v 2.42.0.