Question

Agilent BiomaRT getBM unavailable filter value

0

Entering edit mode

u.seemab • 0

@useemab-12011

Last seen 7.4 years ago

Hi,

I am analysing agilent microarray 2 colors and want to use SurePrint_G3_Human_GE_v3_8x60K_Microarray in getBM but it is not available in list. How to solve this issue?

I have following options

1- efg_agilent_wholegenome_4x44k_v1

2- efg_agilent_wholegenome_4x44k_v2

3- efg_agilent_sureprint_g3_ge_8x60k

4- efg_agilent_sureprint_g3_ge_8x60k_v2

but I want to use SurePrint_G3_Human_GE_v3_8x60K_Microarray.txt for probe mapping.

code:

if(interactive()){
agilent.annot <- c("efg_agilent_wholegenome_4x44k_v1", "efg_agilent_wholegenome_4x44k_v2",
"efg_agilent_sureprint_g3_ge_8x60k", "efg_agilent_sureprint_g3_ge_8x60k_v2")
print(as.data.frame(agilent.annot, stringsAsFactors = F))
cat("Please enter a number to indicate the filter to use as input to the query...\n")
n1 <- scan(n=1)
attributes <- listAttributes(mart)[c(1:100),1]
print(as.data.frame(attributes, stringsAsFactors = F))
cat("Please enter a number to indicate the annotation to use...\n")
n2 <- scan(n=1)
resq <- NULL
if(is.numeric(n1) & is.numeric(n2)) {
# querying BioMart
print(c(agilent.annot[n1], attributes[n2]))
print(agilent.annot[n1])
print(head(probes))
resq <- getBM(attributes = c(agilent.annot[n1], attributes[n2]),
filters = agilent.annot[n1],
values = probes,
mart = mart)
}

biomaRT getBM new version • 1.2k views

ADD COMMENT • link updated 7.4 years ago by James W. MacDonald 65k • written 7.4 years ago by u.seemab • 0

score 1 · Answer 1 · 2016-12-12

There are probably three alternatives here. First, you could go to the Agilent website and try to figure out if there are any material differences between the v2 and v3 versions of this array, and if there are none, you could just use the v2 data on Ensembl to annotate your data.

Second, you could NOT go to the Agilent website (in my interactions with the Agilent website, this appears to be the more sane argument, but ymmv), but simply assume there aren't many differences of note, and just go ahead and use the v2 data on Ensembl. That appears to work to some extent:

> z <- read.table("039494_D_GeneList_20150612.txt", header = TRUE, sep = "\t", fill = T)

## this is the Agilent file for the v3 array. Let's try to annotate the first 50 or so.

> getBM(c("efg_agilent_sureprint_g3_ge_8x60k_v2", "hgnc_symbol","entrezgene"), "efg_agilent_sureprint_g3_ge_8x60k_v2", as.character(z[1:50,1]), mart)
   efg_agilent_sureprint_g3_ge_8x60k_v2 hgnc_symbol entrezgene
1                         A_33_P3339253     LAGE3P1         NA
2                         A_33_P3393543                     NA
3                         A_33_P3495120     CCDC141     285025
4                         A_33_P3217307     ZBTB8OS     339487
5                          A_24_P453544       SURF1       6834
6                         A_33_P3261463   LINC00982     440556
7                         A_33_P3335915       SYNE1      23345
8                         A_33_P3339253        APTX      54840
9                          A_24_P931964                     NA
10                        A_33_P3679768       DPCR1     135656
11                        A_33_P3347301                     NA
12                        A_33_P3212415       ZNF44      51710
13                        A_33_P3295228      ZBTB7A      51341
14                        A_33_P3411980      ABI3BP      25890
15                        A_33_P3416797                 144203
16                        A_33_P3416797                 408186
17                        A_33_P3392740       ATOH8      84913

So that seems to work OK, I guess. Although we sure did seem to lose lots of probes. Adding in the ensembl_transcript_id (results not shown) bumped it up to 25, but that's still 50% that aren't getting annotated. But whatever.

The third alternative would be to use what Agilent gives us and just make our own ChipDb package for this array. Note that I went to Agilent's website and got the file I read in. We first need to write out the first two columns of that file, which contains the probe ID followed by the GenBank ID.

> write.table(z[,1:2], "thefile.txt", col.names = FALSE, row.names = FALSE, quote = FALSE, sep = "\t")
> library(AnnotationForge)
> library("human.db0")

## now make the package
> makeDBPackage("HUMANCHIP_DB", affy = FALSE, prefix = "agilentwhatevs", fileName = "thefile.txt", baseMapType = "gbNRef", version = "0.0.1")
<snip>
Creating package in ./agilentwhatevs.db

We can now install and use this package. You might use a less snarky name for yours... Note that you have to set the repos to NULL so you don't try to get things from CRAN, and if you are on Windows or MacOS, you have to say it's a source package as well.

> install.packages("agilentwhatevs.db/", repos = NULL, type = "source")
<snip>
> library(agilentwhatevs.db)

## Now we can annotate the top 50 probes like above.
> rslts <- select(agilentwhatevs.db, as.character(z[1:50,1]), c("SYMBOL","ENTREZID"))
'select()' returned 1:1 mapping between keys and columns

> head(rslts)
        PROBEID    SYMBOL  ENTREZID
1 A_33_P3696965      <NA>      <NA>
2 A_21_P0014788      <NA>      <NA>
3 A_33_P3451458 LINC01617 101926947
4 A_33_P3210760      <NA>      <NA>
5 A_33_P3339253      <NA>      <NA>
6 A_21_P0014878      <NA>      <NA>
> apply(rslts[,2:3], 2, function(x) sum(!is.na(x)))
  SYMBOL ENTREZID
      26       26

So that's a bit better, but it appears there is a lot of speculative content on this array, so it is still pretty sparse. But anyway, there you go.