Question

Still need help identifying a good workflow for finding DEG in hugene 2.0 st

0

Entering edit mode

hakimelakhrass • 0

@hakimelakhrass-9661

Last seen 7.3 years ago

Hi everyone,

I am new to microarray data and I am having a really hard time finding a good work flow so far. My experiment is with human transcriptome data. I have 4 time points and I am trying to find DEG between them. This is just a test run with two times points. I have no idea what to do and so far my script hasn't worked so well. This is what I have so far. Thanks in advance. My volcano plot also looks really weird.

source("http://bioconductor.org/biocLite.R")
biocLite("oligo")
biocLite("limma")
biocLite("pd.hugene.2.1.st")
library(genefilter)
library(limma)
library(oligo)
library(pd.hugene.2.0.st)

#directory for the data
mydir <- "C:\\Users\\hakim\\Desktop\\Bioinformatics_Thesis_Concordia_Microarry_Data\\Control"

#setting seed for reproducibility
set.seed(1)
#listing the files from directory using special CEL file read function
celList <- list.celfiles(mydir, full.names=TRUE)
#reading data from cellist and setting annotation package to approiate one for this microarray
rawData <- read.celfiles(celList, pkgname='pd.hugene.2.0.st')

#normalizing the data using RMA algorithm
normData <- rma(rawData)
#checking boxplot of raw data
par(mar=c(10,4.5,2,1))
boxplot(rawData,las=3)
#checking boxplot of normalized data
boxplot(normData,las=3)

#the annotation package
biocLite("hugene20sttranscriptcluster.db")

library(hugene20sttranscriptcluster.db)

#retreaving feature data

featureData(normData) <- getNetAffx(normData, "transcript")

design = cbind(control = 1, controlvstreatment = c((rep.int(0,13)),rep.int(1,13)))
fit<- lmFit(normData, design)
normfit <-eBayes(fit)

omg <- topTable(normfit, coef="controlvstreatment", adjust="BH")

write.table(omg,file="finally.txt",sep= "/" )

results <- decideTests(fit)
vennDiagram(results)

volcanoplot(normfit)

microarray hugene20 limma oligo • 2.1k views

ADD COMMENT • link updated 8.5 years ago by Aaron Lun ★ 28k • written 8.5 years ago by hakimelakhrass • 0

1

Entering edit mode

What do you mean by "your script hasn't worked so well"? Has it crashed, or thrown an error? If you don't detect any DE genes, that doesn't mean that the script has failed; it could just mean that you've got no DE. You need to give more information here, as well as something more useful regarding the "weirdness" of the volcano plot.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

The script works, I should have worded that differently. I am the one who doesn't understand so well the output of my script.

My first question is if it is possible to use the limma package with dependent data (time series). If so then this script may be alright; I still do not understand my output so much.

This is the summary of my results.

control controlvstreatment
-1 0 474
0 0 52791
1 53617 352

This is my volcano plots and venn diagram. I am not really sure what the volcano plot is telling me and if this is normal.

http://imgur.com/a/CStnP

Thanks for the help!

ADD REPLY • link 8.5 years ago hakimelakhrass • 0

score 1 · Answer 1 · 2016-02-08

1

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 9 hours ago

The city by the bay

For starters, it makes no sense to do the control contrast. You're effectively testing for whether the intercept is equal to zero for each gene, i.e., whether gene expression is equal to unity. This is a nonsensical contrast as you wouldn't expect that they would have expression values equal to 1; as a result, every gene will probably have massively positive log-fold changes. This shows up in your comment as having all genes with positive log-fold changes in your summary and on your volcano plot. You should only interpret the controlvstreatment column of the decideTests output, and you should set coef=2 in your volcanoplot call.

ADD COMMENT • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Okay, I think I understand. My volcano plot is now fixed. I think I am going in the right direction. Thank you!

ADD REPLY • link 8.5 years ago hakimelakhrass • 0

0

Entering edit mode

but also as far as the control goes. that is what i thought the design should be following the limma documentation. I must have misunderstood.

ADD REPLY • link 8.5 years ago hakimelakhrass • 0

0

Entering edit mode

but also as far as the control goes. that is what i thought the design should be following the limma documentation. I must have misunderstood. If you can point me in the right direction as a way to continue the DE analysis that would be greatly appreciated.

ADD REPLY • link 8.5 years ago hakimelakhrass • 0

1

Entering edit mode

Well, the control column is the intercept. Your tests should only involve the controlvstreatment coefficient.

ADD REPLY • link 8.5 years ago Aaron Lun ★ 28k

0

Entering edit mode

Unfortunately I am a total noob and don't even know what type of test I should be doing.

ADD REPLY • link 8.5 years ago hakimelakhrass • 0

0

Entering edit mode

This is one problem with making sophisticated software available to those who don't have the background to understand what they are doing. This support site is primarily intended to help people with the required background with questions about how to get the software to work, rather than providing statistical support. The former is relatively straightforward to do, whereas the latter much less so.

In your case there are really just two recommendations we can make. The best (IMO) is to recommend that you find somebody local who can help you with the statistics. That's what I would recommend doing, unless you have an abiding interest in statistics. If you do have an interest in statistics then I would recommend either getting some instruction, or to start reading. The limma User's Guide would be the obvious place to start, but it's not a trivial read for somebody with no statistical background.

ADD REPLY • link 8.5 years ago James W. MacDonald 66k

0

Entering edit mode

I apologize for the improper use of the forum and will take note for future posts. I actually have a pretty big background in statistics but was just having problems applying it to this type of data. In the future I will only post about technical issues.

ADD REPLY • link 8.5 years ago hakimelakhrass • 0