Still need help identifying a good workflow for finding DEG in hugene 2.0 st
1
0
Entering edit mode
@hakimelakhrass-9661
Last seen 5.5 years ago

Hi everyone,

I am new to microarray data and I am having a really hard time finding a good work flow so far. My experiment is with human transcriptome data. I have 4 time points and I am trying to find DEG between them. This is just a test run with two times points. I have no idea what to do and so far my script hasn't worked so well. This is what I have so far. Thanks in advance. My volcano plot also looks really weird.

source("http://bioconductor.org/biocLite.R")
biocLite("oligo")
biocLite("limma")
biocLite("pd.hugene.2.1.st")
library(genefilter)
library(limma)
library(oligo)
library(pd.hugene.2.0.st)

#directory for the data
mydir <- "C:\\Users\\hakim\\Desktop\\Bioinformatics_Thesis_Concordia_Microarry_Data\\Control"

#setting seed for reproducibility
set.seed(1)
#listing the files from directory using special CEL file read function
celList <- list.celfiles(mydir, full.names=TRUE)
#reading data from cellist and setting annotation package to approiate one for this microarray
rawData <- read.celfiles(celList, pkgname='pd.hugene.2.0.st')

#normalizing the data using RMA algorithm
normData <- rma(rawData)
#checking boxplot of raw data
par(mar=c(10,4.5,2,1))
boxplot(rawData,las=3)
#checking boxplot of normalized data
boxplot(normData,las=3)

#the annotation package
biocLite("hugene20sttranscriptcluster.db")

library(hugene20sttranscriptcluster.db)

#retreaving feature data

featureData(normData) <- getNetAffx(normData, "transcript")

design = cbind(control = 1, controlvstreatment = c((rep.int(0,13)),rep.int(1,13)))
fit<- lmFit(normData, design)
normfit <-eBayes(fit)

omg <- topTable(normfit, coef="controlvstreatment", adjust="BH")

write.table(omg,file="finally.txt",sep= "/" )

results <- decideTests(fit)
vennDiagram(results)

volcanoplot(normfit)
microarray hugene20 limma oligo • 1.3k views
ADD COMMENT
1
Entering edit mode

What do you mean by "your script hasn't worked so well"? Has it crashed, or thrown an error? If you don't detect any DE genes, that doesn't mean that the script has failed; it could just mean that you've got no DE. You need to give more information here, as well as something more useful regarding the "weirdness" of the volcano plot.

ADD REPLY
0
Entering edit mode

The script works, I should have worded that differently. I am the one who doesn't understand so well the output of my script.

My first question is if it is possible to use the limma package with dependent data (time series). If so then this script may be alright; I still do not understand my output so much.

This is the summary of my results.

control controlvstreatment
-1       0                474
0        0              52791
1    53617                352

This is my volcano plots and venn diagram. I am not really sure what the volcano plot is telling me and if this is normal.

http://imgur.com/a/CStnP

Thanks for the help!

ADD REPLY
1
Entering edit mode
Aaron Lun ★ 27k
@alun
Last seen 16 hours ago
The city by the bay

For starters, it makes no sense to do the control contrast. You're effectively testing for whether the intercept is equal to zero for each gene, i.e., whether gene expression is equal to unity. This is a nonsensical contrast as you wouldn't expect that they would have expression values equal to 1; as a result, every gene will probably have massively positive log-fold changes. This shows up in your comment as having all genes with positive log-fold changes in your summary and on your volcano plot. You should only interpret the controlvstreatment column of the decideTests output, and you should set coef=2 in your volcanoplot call.

ADD COMMENT
0
Entering edit mode

Okay, I think I understand. My volcano plot is now fixed. I think I am going in the right direction. Thank you!

ADD REPLY
0
Entering edit mode

but also as far as the control goes. that is what i thought the design should be following the limma documentation. I must have misunderstood.

ADD REPLY
0
Entering edit mode

but also as far as the control goes. that is what i thought the design should be following the limma documentation. I must have misunderstood. If you can point me in the right direction as a way to continue the DE analysis that would be greatly appreciated.

ADD REPLY
1
Entering edit mode

Well, the control column is the intercept. Your tests should only involve the controlvstreatment coefficient.

ADD REPLY
0
Entering edit mode

Unfortunately I am a total noob and don't even know what type of test I should be doing.

ADD REPLY
0
Entering edit mode

This is one problem with making sophisticated software available to those who don't have the background to understand what they are doing. This support site is primarily intended to help people with the required background with questions about how to get the software to work, rather than providing statistical support. The former is relatively straightforward to do, whereas the latter much less so.

In your case there are really just two recommendations we can make. The best (IMO) is to recommend that you find somebody local who can help you with the statistics. That's what I would recommend doing, unless you have an abiding interest in statistics. If you do have an interest in statistics then I would recommend either getting some instruction, or to start reading. The limma User's Guide would be the obvious place to start, but it's not a trivial read for somebody with no statistical background.

ADD REPLY
0
Entering edit mode
I apologize for the improper use of the forum and will take note for future posts. I actually have a pretty big background in statistics but was just having problems applying it to this type of data. In the future I will only post about technical issues.
ADD REPLY

Login before adding your answer.

Traffic: 494 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6