Question: Importing and Extracting Annotation
0
3.2 years ago by
atiqahrahman0 wrote:

Hi all!

I'm new with R and amd currently working on data from ZebGene-1_0-st arrays. However I am having problem doing the annotations as firstly there is no package in bioconductor and secondly, the sample workflow that I found for the array does not yield a true sanity check/identical. I realised that the workflow below does not extract and reorder to match my probes. Any advice to overcome this problem helps! Thank you in advance :)

Workflow:

# Import the annotations
dat <- col2rownames(dat, "probeset_id")
#extract and reorder to match the array features
dat <- dat[row.names(fData(affyNorm.batch)),]
dat <- dat[,c("probeset_id", "seqname", "strand", "start", "stop", "gene_assignment", "mrna_assignment")]
dat <- as.matrix(dat)
# parse mrna_assignments
mrnas <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
dat.probe.df <- do.call(rbind, strsplit(x, " // "))
bestrna <- dat.probe.df[1,1]
rnas <- paste(dat.probe.df[,1], collapse=",")
c(bestrna, rnas)
}))
mrnas <- as.data.frame(mrnas)
names(mrnas) <- c("best.mrna", "mrnas")
# parse gene assignments
genes <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
if(is.na(x[1])){
out <- rep("NA", 6)
} else {
dat.probe.mat <- as.matrix(do.call(rbind, strsplit(x, " // ")))
bestgene <- as.character(dat.probe.mat[1,1])
dat.probe.vec <- apply(dat.probe.mat, 2, function(y) {
paste(unique(y), collapse=",")
})
out <- as.character(c(bestgene,dat.probe.vec))
}
return(out)
}))

genes <- as.data.frame(genes[,c(1,2,3,4,6)])
names(genes) <- c("bestgene", "accessions", "symbols", "descriptions", "entrezIDs")
genes <- rownames2col(genes, "probeids")
#combo mrna and gene assigments
gene.annots <- cbind(genes, mrnas)
annotation zebrafish probe • 538 views
modified 3.2 years ago by James W. MacDonald50k • written 3.2 years ago by atiqahrahman0
0
3.2 years ago by
United States
James W. MacDonald50k wrote:

This isn't really a good question for this site, as it is only tangentially related to Bioconductor packages, and has more to do with R coding and whatnot. And that sort of thing is IMO better learned by seeing how others have tackled similar problems and emulating what you think is reasonable.

So please note that I have very similar functionality in the devel version of affycoretools that you can see here (you want to look at .dataFromNetaffx). I would also point out a couple of things. First, the pdInfoPackage already comes with a parsed version of the annotation csv file that you can access using getNetAffx from the oligo package (which will already be loaded and available to you). Second, if you put the results into the featureData slot of your ExpressionSet, you can run validObject to make sure things line up correctly. That's a good validity check, plus the featureData slot will propagate through the limma package and end up in your topTable if you analyze your data using limma (which IMO you should).