Question

Importing and Extracting Annotation

0

Entering edit mode

atiqahrahman • 0

@atiqahrahman-10071

Last seen 8.0 years ago

Hi all!

I'm new with R and amd currently working on data from ZebGene-1_0-st arrays. However I am having problem doing the annotations as firstly there is no package in bioconductor and secondly, the sample workflow that I found for the array does not yield a true sanity check/identical. I realised that the workflow below does not extract and reorder to match my probes. Any advice to overcome this problem helps! Thank you in advance :)

Workflow:

# Import the annotations
dat <- read.csv(file.path(metaDir, "ZebGene-1_0-st-v1.na33.3.zv9.transcript.csv"), comment.char = "#", stringsAsFactors=FALSE, na.string = "---")
dat <- col2rownames(dat, "probeset_id")
#extract and reorder to match the array features
dat <- dat[row.names(fData(affyNorm.batch)),]
dat <- dat[,c("probeset_id", "seqname", "strand", "start", "stop", "gene_assignment", "mrna_assignment")]
dat <- as.matrix(dat)
# parse mrna_assignments
headercol <- "mrna_assignment"
mrnas <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
  dat.probe.df <- do.call(rbind, strsplit(x, " // "))
  bestrna <- dat.probe.df[1,1]
  rnas <- paste(dat.probe.df[,1], collapse=",")
  c(bestrna, rnas)
  }))
mrnas <- as.data.frame(mrnas)
names(mrnas) <- c("best.mrna", "mrnas")
# parse gene assignments
headercol <- "gene_assignment"
genes <- t(sapply(strsplit(dat[, headercol], " /// "), function(x) {
  if(is.na(x[1])){
    out <- rep("NA", 6)
    } else {
      dat.probe.mat <- as.matrix(do.call(rbind, strsplit(x, " // ")))
      bestgene <- as.character(dat.probe.mat[1,1])
      dat.probe.vec <- apply(dat.probe.mat, 2, function(y) {
        paste(unique(y), collapse=",")
        })
      out <- as.character(c(bestgene,dat.probe.vec))
      }
  return(out)
  }))

genes <- as.data.frame(genes[,c(1,2,3,4,6)])
names(genes) <- c("bestgene", "accessions", "symbols", "descriptions", "entrezIDs")
genes <- rownames2col(genes, "probeids")
#combo mrna and gene assigments
gene.annots <- cbind(genes, mrnas)

annotation zebrafish probe • 1.3k views

ADD COMMENT • link updated 8.0 years ago by James W. MacDonald 65k • written 8.0 years ago by atiqahrahman • 0

score 0 · Answer 1 · 2016-04-11

This isn't really a good question for this site, as it is only tangentially related to Bioconductor packages, and has more to do with R coding and whatnot. And that sort of thing is IMO better learned by seeing how others have tackled similar problems and emulating what you think is reasonable.

So please note that I have very similar functionality in the devel version of affycoretools that you can see here (you want to look at .dataFromNetaffx). I would also point out a couple of things. First, the pdInfoPackage already comes with a parsed version of the annotation csv file that you can access using getNetAffx from the oligo package (which will already be loaded and available to you). Second, if you put the results into the featureData slot of your ExpressionSet, you can run validObject to make sure things line up correctly. That's a good validity check, plus the featureData slot will propagate through the limma package and end up in your topTable if you analyze your data using limma (which IMO you should).