Annotating DESeq2 Differential Expression Results
Entering edit mode
adeler001 • 0
Last seen 2 days ago

Hello I am trying to annotate a DESeq2 significantly differentially expressed genes results table with the gene start (bp) , gene end (bp) and SNP ID info using the biomaRt package on RStudio, when I look at the data sets available in biomart using the command below I only see human GRCh38 data set. How can I get the GRCh37 data set from biomart?


# look at top 10 databases      
head(biomaRt::listMarts(host = ""), 10)      

###marts providing annotation for specific classes of organisms###
head(biomaRt::listDatasets(biomaRt::useMart("ENSEMBL_MART_ENSEMBL", host = "")), 100)     
biomaRt DESeq2 • 273 views
Entering edit mode
Last seen 23 hours ago
United States

You need to use an archive. See section 2.4 of the vignette.

Entering edit mode

Hello @james-w-macdonald-5106 I followed the exact commands seen in section 2.4 see my commands below

listEnsembl(version = 95)
ensembl95 <- useEnsembl(biomart = 'genes', 
                        dataset = 'hsapiens_gene_ensembl',
                        version = 95) 

But I get this error message: Incorrect BioMart name, use the listMarts function to see which BioMart databases are available

Entering edit mode

You must have an old version of R/Bioc. Setting aside the fact that Ensembl 95 is not GRCh37, I get

> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", version = 95)
Warning message:
In listEnsemblArchives(https = FALSE) :
  Ensembl will soon enforce the use of https.
As such the 'https' argument will be deprecated in the next release.
> mart
Object of class 'Mart':
  Using the ENSEMBL_MART_ENSEMBL BioMart database
  Using the hsapiens_gene_ensembl dataset> 

## and using the right host

> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", host="")
> mart
Object of class 'Mart':
  Using the ENSEMBL_MART_ENSEMBL BioMart database
  Using the hsapiens_gene_ensembl dataset> 

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)

Matrix products: default

[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] biomaRt_2.50.0

Assuming you are using old versions of R and Bioconductor, you should first update both.

Entering edit mode

Hello your right, my mistake I should not have selected ensembl 95 if I want to use GRCh37 I use the right host as you suggested :

 mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", host="")

Then used this command to specify GRCh37

mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", version = GRCh37)

but I get a new error: Error in useEnsembl("ensembl", "hsapiens_gene_ensembl", version = GRCh37) : object 'GRCh37' not found

Entering edit mode

So the first call got you a Mart object that points to the archive site for GRCh37. You can now use that to do things, and you didn't need to do anything more.

What the second call does is figure out which archive you want, based on the version argument. You could use that, but it's not necessary because the first one worked. But for pedantic reasons, please note that 95 is always something in R (because it's a number), but GRCh37 isn't, because unless you put that in quotes, R thinks you want an object in the global workspace called GRCh37 that doesn't exist - it's not a thing.

> 95
[1] 95
> GRCh37
Error: object 'GRCh37' not found

So if you say 'version = 95', that will work because 95 is a number and by definition exists, and R will happily use it. Basically, under the hood what happens is listEnsemblArchives is called, and then whatever you used for the version argument is matched to the 'version' column of the data.frame that is output by listEnsemblArchives. This works because 95 exists, and will be coerced to character when used for matching. But since GRCh37 isn't an existing object, you get the error you see. In other words:

> d.f <- data.frame(A = letters, B = c("GRCh37", 1:25))
## this works because 19 will be coerced to "19" and matched
> d.f[match(19, d.f$B),]
   A  B
20 t 19
#note that everything in the second column is a character
> d.f$B
 [1] "GRCh37" "1"      "2"      "3"      "4"      "5"      "6"      "7"     
 [9] "8"      "9"      "10"     "11"     "12"     "13"     "14"     "15"    
[17] "16"     "17"     "18"     "19"     "20"     "21"     "22"     "23"    
[25] "24"     "25"

## here's the error you get    
> d.f[match(GRCh37, d.f$B),]
Error in match(GRCh37, d.f$B) : object 'GRCh37' not found
## and this is how you would make it work, by using version = "GRCh37"
> d.f[match("GRCh37", d.f$B),]
  A      B
1 a GRCh37

Make sense?

Entering edit mode

Thank you for clarifying. You said I only need this first line to access GRCh37 biomart:

 mart <- useEnsembl("ensembl","hsapiens_gene_ensembl", host="")

but when I use the command (see below) to view the items in the mart it show me items from ensemble version 104 and not from GRCh37. For some reason it redirects me to ensemble version 104, despite using the host sever link for GRCh37


These are the items it shows

biomart                version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 104
2   ENSEMBL_MART_MOUSE      Mouse strains 104
3     ENSEMBL_MART_SNP  Ensembl Variation 104
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 104
Entering edit mode

I have to say this is getting a bit frustrating for me. I feel like I give you the answer, and then you check my work (incorrectly) and then tell me it's not working, rather than just accepting that I might actually know a bit about this subject and taking my advice at face value.

So anyway, if you use listMarts without an argument, you are asking for what marts are available if you use the default arguments for that function. Because, to reiterate, if you don't provide any arguments to a function it uses the default arguments! It doesn't know anything about the existing Mart object in your workspace, and you shouldn't expect that it would (what if you have two Mart objects?). Anyway, the help page should clear that up for you.

And in fact you are meant to be able to do something like listMarts(mart), but so far as I can tell it doesn't work as expected unless you are pointing to the default, most current version.

> ughbro <- useEnsembl("ensembl","hsapiens_gene_ensembl")
> listMarts(ughbro)
               biomart                version
1 ENSEMBL_MART_ENSEMBL      Ensembl Genes 104
2   ENSEMBL_MART_MOUSE      Mouse strains 104
3     ENSEMBL_MART_SNP  Ensembl Variation 104
4 ENSEMBL_MART_FUNCGEN Ensembl Regulation 104

> mart <- useEnsembl("ensembl","hsapiens_gene_ensembl",host = "")
> listMarts(mart)
Error: Unexpected format to the list of available marts.
Please check the following URL manually, and try ?listMarts for advice.

## but like ^^^^^^^^^^^^^^^^^^^^^^^ what does that say there?


> listDatasets(mart)
                dataset              description    version
1 hsapiens_gene_ensembl Human genes (GRCh37.p13) GRCh37.p13
Entering edit mode

Thank you for further clarifying


Login before adding your answer.

Traffic: 226 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6