DE analysis pipeline with Hugene 2.0 ST arrays
0
0
Entering edit mode
@hakimelakhrass-9661
Last seen 4.7 years ago

Hello

So I finally fixed some problems I was having with R in intial processing and now have questions of the workflow/pipeline for DE analysis for semitime series dataset in limma. My experiment is trying to find the DE genes between 4 different time points. The reason I say semi time series is because there are months in between instead of hours like bacterial time series so I am not exactly sure it qualifies but they are definitely linked because it is the same samples. I have 8 biological control replicates and 9 replicates for the other time points. I am basically completely lost. I know for a fact there are DE genes but my script so far shows none. Can any one help me or point me to good tutorials in this. Find my script below. Thank you so much in advanced. Note the code below is just me testing out the workflow with two time points.

source("http://bioconductor.org/biocLite.R")
biocLite("oligo")
biocLite("limma")
biocLite("pd.hugene.2.1.st")
library(genefilter)
library(limma)
library(oligo)
library(pd.hugene.2.0.st)

#directory for the data
mydir <- "C:\\Users\\hakim\\Desktop\\Bioinformatics_Thesis_Concordia_Microarry_Data\\Control"

#setting seed for reproducibility
set.seed(1)
#listing the files from directory using special CEL file read function
celList <- list.celfiles(mydir, full.names=TRUE)
#reading data from cellist and setting annotation package to approiate one for this microarray
rawData <- read.celfiles(celList, pkgname='pd.hugene.2.0.st')
#normalizing the data using RMA algorithm
normData <- rma(rawData)
#checking boxplot of raw data
par(mar=c(10,4.5,2,1))
boxplot(rawData,las=3)
#checking boxplot of normalized data
boxplot(normData,las=3)

#the annotation package
biocLite("hugene20sttranscriptcluster.db")
library(hugene20sttranscriptcluster.db)

# Strategy is to create data frame objects and merge them together - put expression info into a data frame
my_frame <- data.frame(exprs(normData))

# Put annotation information in a data frame.  To get specific fields, use packageNameSYMBOL, where the caps part names the type of data you're after
hugene20sttranscriptcluster()
Annot <- data.frame(ACCNUM=sapply(contents(hugene20sttranscriptclusterACCNUM), paste, collapse=", "),
SYMBOL=sapply(contents(hugene20sttranscriptclusterSYMBOL), paste, collapse=", "),
DESC=sapply(contents(hugene20sttranscriptclusterGENENAME), paste, collapse=", "))
#retreaving feature data

featureData(normData) <- getNetAffx(normData, "transcript")

phn = normData@phenoData

phn@data[1:14,2]= "1"

phn@data[15:27,2]="0"

colnames(phn@data)[2]="source"

phn@data

# Merge data frames together (like a database table join) also removing NAs
all <- merge(Annot, my_frame, by.x=0, by.y=0, all=T)

merge <- subset(all, SYMBOL!="NA")

# Write out to a file:
write.table(merge,file="thesis.ann.txt",sep="\t")

#plm

Pset = fitProbeLevelModel(rawData)

op = par(mfrow = c(1,1))
for (i in 1:27){image(Pset,which=i,main=ph@data\$sample[i])}

design = cbind(control = 1, controlvstreatment = c((rep.int(0,13)),rep.int(1,13)))
fit<- lmFit(normData, design)
fit <-eBayes(fit)

results <- decideTests(fit)
vennDiagram(results)