Dear all,
I've been going round in circles with a microarray analysis, as this is the first time I analyse microarray data!
I had initially used oligo for my analysis, and managed to get results of differential expression both at the transcript level and at the probeset level.
The reason why I thought oligo wasn't appropriate was that it can't do transcript-level presence/absence calls, it can only do it at the probeset level.
The samples I'm analysing are not brown Norway rat, but SHR strain rats, so there will probably be some probes from the Rat 2.1 Gene ST array that don't bind to this strain.
Hence I thought it was important to do the presence / absence calls at the transcript level. I believe a transcript-level analysis is best suited for gene ST arrays.
I then tried to use xps but documentation seemed limited and I struggled with it.
What would you recommend?
Are the presence/absence calls necessary in my case? Or should my results from oligo at the transcript level using only rma normalisation be reliable?
Or perhaps there is a way to do the presence / absence calls at the probeset level and then use this to do the transcript level summarisation?
I'm at a loss!
I copy my code for oligo below, including my presence / absence calls at the probeset level.
Thanks a lot for your help,
Best wishes,
Sophie.
# Read the raw data (CEL files)
library(oligo)
(celFiles <- list.celfiles(baseDir, full.names=TRUE))
rawData <- read.celfiles(celFiles) # Gives an GeneFeatureSet, ExpressionFeatureSet
# Add the phenotype data to geneCore
tab <- exprs(geneCore)
phenoData<-read.table("phenotypeData.txt",h=T)
row.names(phenoData) <- phenoData$id
phenoData <- phenoData[colnames(tab),]
pData(geneCore)<-phenoData
# Normalise rawData using rma, target = “core”
geneCore <- rma(rawData, target = "core") featureData(geneCore) <- getNetAffx(geneCore, "transcript")
# Transcript level differential expression (2x2 factorial design)
library(limma) TS <- paste(phenoData$Strain, phenoData$Tissue, sep=".") TS <- factor(TS,) design <- model.matrix(~0+TS) colnames(design) <- levels(TS) fitGeneCore <- lmFit(geneCore, design) cont.matrix <- makeContrasts(SvsC2inLV=S.LV-C2.LV,SvsC2inEF=S.EF-C2.EF,Diff=(S.EF-C2.EF)-(S.LV-C2.LV),levels=design) fitGeneCore <- contrasts.fit(fitGeneCore, cont.matrix) fitGeneCore <- eBayes(fitGeneCore) fit2<-fitGeneCore geneNames<-unlist(lapply(fit2$genes$geneassignment, function(x) if(!is.na(x)) {return(strsplit(strsplit(x, " // ")[[1]][2]," /// ")[[1]][1])}else{return("NA")})) entrezNames<-as.character(sapply(sapply(sapply(lapply(fit2$genes$geneassignment, function(x) if(!is.na(x)) {strsplit(x, " // ")}else{return(list(list("NA","NA","NA","NA","NA")))}),"[[",1),"[[",5), function(x) gsub(" /// .+","",x))) topTable(fit2,coef=1,adjust.method="BH",number=Inf,genelist=geneNames)
# Presence / absence calls at the probeset level (remove probesets for which more than 50% of samples do not have a p-value < 0.05)
eset <- rma(rawData,target="probeset") featureData(eset) <- getNetAffx(eset, "probeset") dabgPS<-paCalls(rawData,"PSDABG") ind <- apply(dabgPS, 1, function(x) sum(x<0.05) > 4) sorted.ind<-ind[match(featureNames(eset),names(ind))] # sort ind according to probe order in eset not_ctrls <- featureNames(eset) %in% names(sorted.ind[sorted.ind=="TRUE"]) eset.filtered <- eset[not_ctrls,]