Question

How to import normalized file (using xps) to limma for differential expressed genes?

0

Entering edit mode

Biologist ▴ 120

@biologist-9801

Last seen 5.0 years ago

Hi Christian,

For HuGene20stv1 array I have used xps package:

library(xps)

input <- choose.dir("D:/GeneST_Analyse/ProjectName")
setwd(input)
print(dir(input))

#Subdirectories

scmdir <- paste(input,"schemes",sep="/")
libdir <- paste(input,"library",sep="/")
anndir <- paste(input,"Annotation",sep="/")
celdir <- paste(input,"CELfiles",sep="/")
rootdir <- paste(input,"rootdata",sep="/")

scheme.exon <- import.exon.scheme("Scheme",filedir=scmdir,layoutfile=paste(libdir,"HuGene-2_0-st.clf",sep="/"),schemefile=paste(libdir,"HuGene-2_0-st.pgf",sep="/"),probeset=paste(anndir,"HuGene-2_0-st-v1.na35.hg19.probeset.csv",sep="/"),transcript=paste(anndir,"HuGene-2_0-st-v1.na35.hg19.transcript.csv",sep="/"),verbose=TRUE)

celfiles <- dir(path = "D:/GeneST_Analyse/ProjectName/CELfiles", pattern = "*.CEL$", all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE)
data.exon <- import.data(scheme.exon,"tmp_data_exon",filedir=rootdir,celfiles=celfiles,celdir=celdir,verbose=FALSE)

unlist(treeNames(data.exon))

Normalization:

data.rma <- rma(data.exon, "tmp_exonRMA", filedir=rootdir, verbose=FALSE, exonlevel="affx+core", option="transcript", background="antigenomic")

Now, how to use limma with this data to get differential expressed genes?

Thank you

xps heatmap limma • 4.2k views

ADD COMMENT • link updated 9.0 years ago by cstrato ★ 3.9k • written 9.0 years ago by Biologist ▴ 120

score 2 · Answer 1 · 2016-03-24

2

Entering edit mode

James W. MacDonald 68k

@james-w-macdonald-5106

Last seen 3 hours ago

United States

This site is primarily intended to help people with specific questions about how to use Bioconductor software rather than a tutorial site where people can ask open-ended questions and get free tutorials on how to do a data analysis. If you want to learn how to analyze data using limma, you should read the limma User's Guide.

ADD COMMENT • link 9.0 years ago James W. MacDonald 68k

score 1 · Answer 2 · 2016-03-24

Dear Venkatesh,

The following should give you the expression values with the UnitNames:
> expr.rma <- validData(data.rma)

Furthermore, you could use function 'export', see '?export'
This will give you the annotation data you want, e.g. gene name, gene symbol, etc, e.g.:

This will export the data and create a data.frame:
> expr.rma <- export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = TRUE, verbose = TRUE)

or to export the data only as text file:
> export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = FALSE, verbose = TRUE)

Since the results of function rma() are linear values you need to convert them fist to log2 values:
> tmp <- as.matrix(log2(expr.rma[,'columns to convert to log2]))

For using limma, here is an example from my examples in xps/examples/script4bestmatch.R.
This is w/o guarantee! You need to adjust it to your own needs.

> library(limma)

## create design matrix
> tissue <- c("Breast","Breast","Breast","Prostate","Prostate","Prostate")
> design <- model.matrix(~factor(tissue))
> colnames(design) <- c("Breast","BreastvsProstate")
> design

> fit <- lmFit(tmp, design)
> fit <- eBayes(fit)
> xpsu.lm <- topTable(fit, coef=2, n=length(rownames(tmp)), adjust="BH")
> xpsu.lm <- xpsu.lm[order(xpsu.lm[,"ID"]),c("ID","logFC","P.Value","adj.P.Val")]
> colnames(xpsu.lm) <- c("xpsu.ID","xpsu.logFC","xpsu.P.Value","xpsu.adj.P.Val")

Maybe, you need ot create a contrst matrix, e.g.:
> cont.matrix <- makeContrasts(........,
levels = design
)
> fit2 <- contrasts.fit(fit,cont.matrix)
> fit3 <- eBayes(fit2)

For further questions regarding limma please read the user guide and ask the experts.

Best regards,
Christian

score 1 · Answer 3 · 2016-03-29

1

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.5 years ago

Austria

Please read my first reply again:

Furthermore, you could use function 'export', see '?export'
This will give you the annotation data you want, e.g. gene name, gene symbol, etc, e.g.:

This will export the data and create a data.frame:
> expr.rma <- export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = TRUE, verbose = TRUE)

or to export the data only as text file:
> export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = FALSE, verbose = TRUE)

This will give you the annotation for the unitnumbers.

Regards,
Christian

ADD COMMENT • link 9.0 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi Christian,

I have tried that before and this is what I GOT

> expr.rma <- export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = TRUE, verbose = TRUE)
> head(expr.rma)
UNIT_ID UnitName GeneSymbol X_8__HuGene_2_0_st_.mdp_LEVEL
1 0 17127159 <NA> 353.56800
2 1 17127161 <NA> 11.08420
3 2 17127163 <NA> 221.64700
4 3 17127165 <NA> 5.02182
5 4 17127167 <NA> 393.91500
6 5 17127169 <NA> 7.38325
E__HuGene_2_0_st_.mdp_LEVEL GG7__HuGene_2_0_st_.mdp_LEVEL
1 335.29500 221.71700
2 7.01459 7.33511
3 226.77400 136.27400
4 3.41172 4.12834
5 433.67100 228.32800
6 6.05120 4.32289
J__HuGene_2_0_st_.mdp_LEVEL O__HuGene_2_0_st_.mdp_LEVEL
1 815.65400 684.85000
2 11.21210 12.62680
3 431.32000 392.53300
4 6.90306 4.91183
5 1083.58000 695.28500
6 6.09091 4.60186
T__HuGene_2_0_st_.mdp_LEVEL
1 803.73600
2 10.36830
3 704.74300
4 5.65681
5 830.41600
6 5.88536

ADD REPLY • link 9.0 years ago Biologist ▴ 120

1

Entering edit mode

I would suggest that you open the original Affymetrix file 'HuGene-2_0-st-v1.na35.hg19.transcript.csv' in
order to get a feeling for the annotation data. You will see that e.g. UnitName=17127159 has no gene symbol,
since it is an 'control->affx' probeset.
Please note that only 'UnitName', i.e. the transcript_cluster_id, is unique.

Did you open the outfile = "MyName.txt" to see how it looks like?

I would suggest that you use 'varlist = "*"' in order to see what data are exported. Furthermore, please
read the help '?export.expr' carefully.

(FYI, when typing 'help.start() in R then your browser will open and 'xps' will be listed in 'Packages'.
Then you can read the help for all 'xps' functions in your browser.)

Regards,
Christian

ADD REPLY • link 9.0 years ago cstrato ★ 3.9k

0

Entering edit mode

As you said I have given in the following way.

export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fTranscriptID:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = FALSE, verbose = TRUE)

I want to check which probeset belongs to which transcript so I gave fTranscriptID.

UNIT_ID UnitName TranscriptID X_8 E GG7 J O T

3742   16657437   16657436   2.10763   2.10763   1.32498   4.14518   9.77053   13.9962
3743   16657438   16657436   2.54837   1.92386   3.63425   2.26163   3.24937   2.79736
3744   16657439   16657436   11.3137   7.27181   10.0098   10.0176   15.2267   6.35405
3745   16657441   16657440   4.19212   6.30431   2.10763   1.06896   1.43262   1.74003
3746   16657442   16657440   2.57503   1.89032   1.87688   4.1086   3.58509   2.44511
3747   16657443   16657440   41.9018   22.0025   23.6494   10.6459   29.4596   13.7648
3748   16657444   16657440   10.2051   12.3036   13.1857   6.5307   18.2471   7.20289
3749   16657446   16657445   1.6846   1.40303   1.44591   2.18139   2.3705   5.62349

It gave the Transcript ID but for fUnitName why it is not probesetID?

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

In this case fUnitName is the probesetID as you can easily verify by looking at the Affymetrix probeset
annotation file 'HuGene-2_0-st-v1.na35.hg19.probeset.csv'

Regards,
Christian

ADD REPLY • link 9.0 years ago cstrato ★ 3.9k

score 0 · Answer 4 · 2016-03-24

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.5 years ago

Austria

Please read the help '?topTable' carefully. It says:
'By default number probes are listed. Alternatively, by specifying p.value and number=Inf,
all genes with adjusted p-values below a specified value can be listed.'

Thus, it is up to you which p-value you want to use as cutoff.

Best regards,
Christian

ADD COMMENT • link 9.0 years ago cstrato ★ 3.9k

0

Entering edit mode

Ok. Thank you

And Apart from Limma I used PreFilter and UniFilter which is mentioned in xps.pdf

prefltr <- PreFilter(mad=c(0.5), lothreshold=c(7.0,0.02,"mean"), hithreshold=c(10.5,80.0,"percent"))

rma.pfr <- prefilter(data.rma, "tmpdt_exonPrefilter", filedir=rootdir , filter=prefltr, minfilters=2, verbose=FALSE)

tmp <- validData(rma.pfr)
head(tmp)

dim(tmp[tmp[,"FLAG"]==1,])

The data show that 5295 genes of the 44796 genes on the GeneChip are selected for further analysis.

unifltr <- UniFilter(foldchange=c(1.3,"both"), unifilter=c(0.1,"pval"))

rma.ufr <- unifilter(data.rma, "tmpdt_exonUnifilter", filedir=rootdir , unifltr, group=c("GrpA","GrpA","GrpA","GrpB","GrpB","GrpB"), xps.fltr=rma.pfr, verbose=FALSE)

tmp <- validData(rma.ufr)
tmp
dim(tmp)

The data show that only 767 genes of the pre{selected 5295 genes are considered to be differentially expressed. Is generating heatmap with this possible?

But I need a heatmap So I also tried using limma. But Dont know where I went wrong. Please help me out.

Thank you

ADD REPLY • link 9.0 years ago Biologist ▴ 120

score 0 · Answer 5 · 2016-03-24

0

Entering edit mode

cstrato ★ 3.9k

@cstrato-908

Last seen 6.5 years ago

Austria

For 767 genes it is possible to create a heatmap and/or to do cluster analysis.

Basic R has functions 'hclust()', 'heatmap()', 'image()' for these purposes.
Furthermore, I am sure that Bioconductor has packages for these purposes, too.

Finally, limma has its own function 'heatDiagram()'.

Please note that these questions have nothing to do with 'xps'.

Best regards,
Christian

ADD COMMENT • link 9.0 years ago cstrato ★ 3.9k

0

Entering edit mode

Hi christian,

To get a heatmap I have used in the following way.

Group <- c("GrpA","GrpA","GrpA","GrpB","GrpB","GrpB") 
design <- model.matrix(~factor(Group)) 
colnames(design) <- c("GrpA","GrpAvsGrpB") 
design

fit <- lmFit(tmp, design)
cont.matrix <- makeContrasts(GrpAvsGrpB, levels = design) 
fit2 <- contrasts.fit(fit,cont.matrix)
fit3 <- eBayes(fit2) 
tab <- topTable(fit3, coef=1, n=Inf, adjust="fdr", sort.by="none")
idx = which(tab$adj.P.Val < 0.05)

library(gplots)
heatmap.2(tmp[idx,],trace='none',scale='row')

But this gives the following error.

Error:

Error in heatmap.2(tmp[idx, ], trace = "none", scale = "row") :

'x' must have at least 2 rows and 2 columns

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

Actually, Dear ghk, just two comments above for your code before the actual heatmap function (perhaps you wrote it in hurry, but is a bit confused):

in the start you use a design matrix with an intercept:

design <- model.matrix(~factor(Group))

and then you arbitary use makeContrasts, which does not make sense, as you have already defined your contrast GrpAvsGrpB before lmFit. Thus, you should have continued directly to eBayes step.

Moreover, your definition above in the columns of design matrix (GrpAvsGrpB) is not correct:

> Group <- c("GrpA","GrpA","GrpA","GrpB","GrpB","GrpB") > design <- model.matrix(~factor(Group)) > design (Intercept) factor(Group)GrpB 1 1 0 2 1 0 3 1 0 4 1 1 5 1 1 6 1 1

So, your second coefficient is the difference of GrpB vs the GrpA, not the inverse.

Best,

Efstathios

ADD REPLY • link 9.0 years ago svlachavas ▴ 840

0

Entering edit mode

Hi Efstathios,

Thanks for the reply. Please check the following once.

design <- model.matrix(~ -1+factor(c(1,1,2,2,3,3)))

colnames(design) <- c("group1", "group2", "group3")
fit <- lmFit(celfiles.rma, design)

contrast.matrix <- makeContrasts(group2-group1, group3-group2, group3-group1, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

tab <- topTable(fit2, coef=1, adjust="fdr", sort.by="B", number=Inf)

head(tab)
logFC AveExpr t P.Value adj.P.Val B
17100820 3.813465 10.084444 13.990013 1.477686e-05 0.3544622 -4.142070
17100824 3.813465 10.084444 13.990013 1.477686e-05 0.3544622 -4.142070
16744689 -3.037688 1.577746 -13.252989 1.983301e-05 0.3544622 -4.143913
16842465 -6.954801 4.096664 -11.968499 3.444300e-05 0.4616826 -4.147931
16843156 3.222599 2.013231 11.375102 4.530001e-05 0.4857702 -4.150236
16706416 -2.014341 2.072348 -8.395743 2.270946e-04 1.0000000 -4.169346

idx = which(tab$P.Val < 0.9)

heatmap.2(tab[idx,],trace='none',scale='row')

Error in heatmap.2(tab[idx, ], trace = "none", scale = "row") :
`x' must be a numeric matrix

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

Dear gnk,

i'm no certain what tab is ? if im correct, it must be your eSet; if so, this explains the Error above--try:

heatmap.2(exprs(tab)[idx,],trace='none',scale='row')

This should now work

ADD REPLY • link 9.0 years ago svlachavas ▴ 840

0

Entering edit mode

Yes I tried but it gave another error now.

heatmap.2(exprs(tab)[idx,],trace='none',scale='row')

Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘exprs’ for signature ‘"data.frame"’

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

So, tab isn't your eSet ? you should use your expressionSet container object with the above function to work-anyway, heatmap.2 expects a numeric matrix to work, that's why it complains :)

ADD REPLY • link 9.0 years ago svlachavas ▴ 840

0

Entering edit mode

Could you please check this once and tell me where I went wrong.

setwd("D:/Chaduvulu/ViroScan3D/GeneST_Analyse/CELFiles")
library(oligo)
celfiles = list.files(path = ".", pattern = ".CEL$", all.files = FALSE,
full.names = FALSE, recursive = FALSE, ignore.case = FALSE)

celfiles
[1] "#8_(HuGene-2_0-st).CEL" "E_(HuGene-2_0-st).CEL" "GG7_(HuGene-2_0-st).CEL"
[4] "J_(HuGene-2_0-st).CEL" "O_(HuGene-2_0-st).CEL" "T_(HuGene-2_0-st).CEL"

Data <- read.celfiles(celfiles)
celfiles.rma <- rma(Data, target="core")

library(limma)

design <- model.matrix(~ -1+factor(c(1,1,2,2,3,3)))

colnames(design) <- c("group1", "group2", "group3")
fit <- lmFit(celfiles.rma, design)

contrast.matrix <- makeContrasts(group2-group1, group3-group2, group3-group1, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

tab <- topTable(fit2, coef=1, adjust="fdr", sort.by="none", number=Inf)
write.table(tab, file="DEG2.xls",row.names=F, sep="\t")

idx = which(tab$P.Val < 0.9)
heatmap.2(exprs(tab)[idx,],trace='none',scale='row')

ADD REPLY • link 9.0 years ago Biologist ▴ 120

1

Entering edit mode

You should use:

heatmap.2(exprs(celfiles.rma)[idx,],trace='none',scale='row') # topTable output has nothing relevant here

ADD REPLY • link 9.0 years ago svlachavas ▴ 840

0

Entering edit mode

Hi christian,

Using "xps" package

data.rma <- rma(data.exon, "tmp_exonRMA", filedir=rootdir, verbose=FALSE, exonlevel="affx+core", option="transcript", background="antigenomic")

expr.rma <- export.expr(data.rma, treename = "*", treetype = "mdp", varlist = "fUnitName:fSymbol:fLevel", outfile = "MyName.txt", sep = "\t", as.dataframe = TRUE, verbose = TRUE)

tmp <- as.matrix(log2(expr.rma[,4:9]))

library(limma)

design <- model.matrix(~ 0+factor(c(1,1,1,2,2,2)))

colnames(design) <- c("group1", "group2")
fit <- lmFit(tmp, design)

contrast.matrix <- makeContrasts(group2-group1, group1-group2, levels=design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

tab <- topTable(fit2, coef=2, adjust="fdr", sort.by="none", number=Inf)
write.table(tab, file="DEG.xls",row.names=F, sep="\t")

results <- tab[which(tab$logFC >= 1.5 & tab$P.Value <= 0.05),]
write.table(results, file="DEGp.xls", row.names=T, sep="\t")

idx = which(tab$P.Value < 0.05 & tab$logFC > 1.5)

heatmap(tmp[idx],trace='none',scale='row')
Error in heatmap(tmp[idx], trace = "none", scale = "row") :
'x' must be a numeric matrix

Could you please help me in this?

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

Dear ghk,

you missed a comma above--you should use again:

heatmap(tmp[idx,],trace='none',scale='row') # (i hope you do not feed the troll :))

ADD REPLY • link 9.0 years ago svlachavas ▴ 840

0

Entering edit mode

Oh ya I didnt see that. Very very sorry. its my fault. Thank you very much @svlachavas

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

And a small qtn. How can I add Annotation to this; I gat the Unitnumbers instead of gene names.

head(tab)
logFC AveExpr t P.Value adj.P.Val B
16650001 -0.67713091 0.7013795 -2.11246203 0.07463104 0.8414740 -3.923065
16650003 0.01775323 0.8173575 0.05752413 0.95581904 0.9922585 -5.247542
16650005 0.20727356 1.2926154 0.28654513 0.78318782 0.9636612 -5.214740
16650007 0.13198594 0.8028477 0.37395467 0.72008320 0.9502862 -5.191016
16650009 -0.31117759 0.7187522 -0.92432176 0.38765340 0.8815813 -4.917247
16650011 -0.18142907 0.7450577 -0.46483921 0.65689248 0.9382330 -5.160082

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

When I run the above code I get the logFoldchange values but I need the linear FoldChange values. So how to do that? Any idea? I want to keep P.Value <= 0.05 and FoldChange >= 1.5 Looking forward to your response.

ADD REPLY • link 9.0 years ago Biologist ▴ 120

0

Entering edit mode

@svlachavas Could you please hep me to get heatmap of differential expressed exons. I'm working on Human Gene 2.0 ST array. I have used "xps" package for the analysis.

scheme.exon <- import.exon.scheme("Scheme",filedir=scmdir,layoutfile=paste(libdir,"HuGene-2_0-st.clf",sep="/"),schemefile=paste(libdir,"HuGene-2_0-st.pgf",sep="/"),probeset=paste(anndir,"HuGene-2_0-st-v1.na35.hg19.probeset.csv",sep="/"),transcript=paste(anndir,"HuGene-2_0-st-v1.na35.hg19.transcript.csv",sep="/"),verbose=TRUE)

##Reading CEL Info
#Import CEL-files by giving the path where all .CEL files are present

celfiles <- dir(path = "D:/GeneST_Analyse/ProjectName/CELfiles", pattern = "*.CEL$", all.files = FALSE, full.names = FALSE, recursive = FALSE, ignore.case = FALSE)
data.exon <- import.data(scheme.exon,"tmp_data_exon",filedir=rootdir,celfiles=celfiles,celdir=celdir,verbose=FALSE)

unlist(treeNames(data.exon))

[1] "X_8__HuGene_2_0_st_.cel" "E__HuGene_2_0_st_.cel"  
[3] "GG7__HuGene_2_0_st_.cel" "J__HuGene_2_0_st_.cel"  
[5] "O__HuGene_2_0_st_.cel"   "T__HuGene_2_0_st_.cel"

data.rma <- rma(data.exon, "tmp_exonRMA", filedir=rootdir, verbose=FALSE, exonlevel="affx+core", option="transcript", background="antigenomic")

>data.rma

UNIT_ID   UN       X       E       GG7     J     O
0      17127159 353.568 335.295 221.717 815.654 684.85
1      17127161 11.0842 7.01459 7.33511 11.2121 12.6268
2      17127163 221.647 226.774 136.274 431.32  392.533
3      17127165 5.02182 3.41172 4.12834 6.90306 4.91183

unifltr <- UniFilter(foldchange=c(1.5,"both"), unifilter=c(0.05,"pval"))

rma.ufr <- unifilter(data.rma, "tmp_exonUnifilter", filedir=rootdir , filter=unifltr, group=c("GrpA","GrpA","GrpA","GrpB","GrpB","GrpB"), xps.fltr=rma.pfr, verbose=FALSE)

tmp2 <- validData(rma.ufr)
tmp2 has differential expressed genes.

UN          ID    St      M1    M2       SE    DOF  PV        PA            FC
17127159    0   -5.9    297.3   765.7   0.22    4   0.003   0.00389231  2.57536
17127163    2   -3.87   189.914 492.307 0.3548  4   0.0179  0.01795     2.59226
17127167    4   -3.8908 339.136 855.276 0.3429  4   0.0176  0.017       2.52192
17127171    6   -3.922  390.44  986.365 0.340   4   0.0172179   0.01721 2.52627
17127175    8   -4.715  536.072 1210.65 0.2492  4   0.00920158  0.00920 2.258

How can I get a heatmap for this?

ADD REPLY • link 9.0 years ago Biologist ▴ 120