Question: Gene set Testing with Limma for Agilent IDs probes?
0
gravatar for amalbouzid.ing
2.4 years ago by
amalbouzid.ing0 wrote:

Hi all,

I'm studying Agilent Human Gene Expression one color microarray using Limma. It results significant probes with the topTable function but then I didn't succed to do the conversion of my identifiers with "ids2indices" function? nor later the gene set testing?

Please, find below my code, I'm new to Limma what am I doing wrong??
Thanks,

Best,

Amal

 

##

>require("TxDb.Hsapiens.UCSC.hg19.knownGene")

>humangenes<-genes(TxDb.Hsapiens.UCSC.hg19.knownGene) 

or

>download.file("http://bioinf.wehi.edu.au/software/MSigDB/human_c2_v5p2.rdata","human_c2_v5p2.rdata")
>load("human_c2_v5p2.rdata")
>summary(Hs.c2)

>SystematicName.allgenes <- myresults$genes$SystematicName

#Agilent SystematicName look like:

> SystematicName.allgenes[(1:10)]
 [1] "NM_004333"       "NM_015987"       "NM_024604"       "NM_178829"      
 [5] "ENST00000429990" "NM_001040196"    "NM_030961"       "NM_033081"      
 [9] "NM_020188"       "NM_012318" 

>indices.genes <- ids2indices(humangenes$gene_id, SystematicName.allgenes,remove.empty=TRUE)

>indices.genes  #named list()

>length(indices.genes) # it results: [1] 0

or

>genes.indices.Hs <- ids2indices(Hs.c2,SystematicName.allgenes)

>genes.indices.Hs  #named list()

>length(genes.indices.Hs) # it results: [1] 0

## The same if I use the probe names as identifiers:

probenames.allgenes  <- myresults$genes$ProbeName

#Agilent ProbeName look like:

> probenames.allgenes[(1:10)]
 [1] "A_23_P42935"  "A_23_P117082" "A_23_P2683"   "A_23_P157316" "A_32_P14850" 
 [6] "A_23_P158596" "A_23_P350107" "A_23_P388190" "A_23_P106544" "A_23_P94998" 

 

 

 

 

 

 

ADD COMMENTlink modified 2.4 years ago • written 2.4 years ago by amalbouzid.ing0
Answer: Gene set Testing with Limma for Agilent IDs probes?
2
gravatar for James W. MacDonald
2.4 years ago by
United States
James W. MacDonald50k wrote:

Running summary on a list is probably not that great an idea, as you doubtless found out. Instead, it's preferable to first find the class of an object and then query from there.

> class(Hs.c2)
[1] "list"
> length(Hs.c2)
[1] 4729
> class(Hs.c2[[1]])
[1] "character"
> head(Hs.c2[[1]])
[1] "55902" "2645"  "5232"  "5230"  "5162"  "5160"

So Hs.c2 is a list of numbers in character format. That's not super useful in and of itself, but you could go to the site where you downloaded these data and see that these are character vectors of Entrez Gene IDs.

The help page says the second argument should be a 'character vector of gene identifiers' which you should interpret as 'a character vector of gene identifiers of the same type as in the gene set list you downloaded'. So you need the Entrez Gene IDs for each of your Agilent probes, in character format. This is where things get tricky, because you just have this stupid SystematicName column in the genes data.frame that is pretty useless. The array you are using is the Agilent-026652, if I am not mistaken, so you need the HsAgilentDesign026652.db package (which you can infer by the fact that, um, 026652 is the same for both the array and the annotation package, so you have to do a bit of Sherlock Holmes for that step).

You can now annotate your data, assuming that your normalized data are stored in an EList called 'myresults'

library(BiocInstaller)
biocLite("Hs.AgilentDesign026652.db")
library(Hs.AgilentDesign026652.db)
## add Entrez Gene IDs to your EList
myresults$genes$ENTREZID <- mapIds(Hs.AgilentDesign026652.db, myresults$genes$ProbeName, "ENTREZID","PROBENAME")

## fit the model
fit <- lmFit(myresults, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)
tt <- topTable(fit2, 1)

Now the topTable 'tt' has a column of Entrez Gene IDs.

ind <- ids2indices(Hs.c2, as.character(tt$ENTREZID))

 

 

ADD COMMENTlink written 2.4 years ago by James W. MacDonald50k
Answer: Gene set Testing with Limma for Agilent IDs probes?
0
gravatar for amalbouzid.ing
2.4 years ago by
amalbouzid.ing0 wrote:

Dear James,

Many thanks for your comments and explanation! It really helped me.

Absolutely, I'm using the Agilent-026652 arrays. I write to you a bit of my Sherlock Holmes for that step!

I added to your code:

Considering that  "my probes" is a data farme of my resulted significant probes:

>require(HsAgilentDesign026652.db)
>mykeys = myprobes$ProbeName
>mykeys = as.vector(mykeys)
>myprobes$ENTREZID <- mapIds(HsAgilentDesign026652.db,  keys= mykeys , column = "ENTREZID", keytype = "PROBEID")
>probes_ZID <- ids2indices(Hs.c2, as.character(myprobes$ENTREZID)) # results #named list()
##as an alternative
>myENTREZID= myprobes[,12] 
>myind <- ids2indices(Hs.c2, myENTREZID)

##

But after that about 23% of my probes don't have an ENTREZID! It isn't a loss of information?

 

 

 

ADD COMMENTlink written 2.4 years ago by amalbouzid.ing0
1

If you want to make a comment on my post, please use the ADD COMMENT button, rather than the 'Add your answer' box (which is intended for answers only).

Your question is whether or not this is a loss of information, and it isn't clear if it is. Not every sequence has an Entrez Gene ID, and there can be any number of reasons why not. There are lots of cDNA clones, hypothetical transcripts, etc that were put on the array as speculative content, and those things won't have an Entrez Gene ID until there is more evidence that they are real.

So one could argue that restricting to probes with linked Entrez Gene IDs increases the information by getting rid of noise probes that don't measure anything.

ADD REPLYlink written 2.4 years ago by James W. MacDonald50k

Hello James,

Thanks for your quick reply.

So afterwards, it will be interesting to focus only on probes mapped to Enter Gene IDs without seeking explanations           of non-mapping for the others!? Thanks,

Best,

 

ADD REPLYlink written 2.4 years ago by amalbouzid.ing0
1

Oh, I don't know. It depends on how much work you want to put in. You could register for Agilent's eArray site, or more easily, just go to GEO and download the annotation table, which has the probe sequences. You could then get all the probe sequences that they don't have annotations for, and then blast against something (just human, human + mouse, nt?) to get the most likely alignment for that probe, and then add that back into your annotations.

Or you could take the sequences, convert to FASTA, and then use an aligner like subread, or bowtie, or whatever you prefer to align against GRCh38, then read the bam file into R, and use summarizeOverlaps to get the genes that each probe aligns to.

You could probably do other things as well. It just depends on how interested you are in seeking explanations for the non-mapping. In the end I imagine you won't gain many extra annotations by doing that, but you may gain experience in how one would do such a thing, which may be enough of a gain for you to do it anyway.

ADD REPLYlink written 2.4 years ago by James W. MacDonald50k

Great! Thanks James. Certainly that always there will be an interest!

ADD REPLYlink written 2.4 years ago by amalbouzid.ing0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 281 users visited in the last hour