Why can't I use the same R package to get the same conclusions as the author?
3
0
Entering edit mode
llkxiaolan ▴ 10
@llkxiaolan-13767
Last seen 6.0 years ago

One question that puzzles me is that I use the same R package as the author. Why don't you get the same conclusion as the author?. Specifically, in the literature, the authors used Affy and limma packages to select differential genes, |logFC| > 1, P < 0.05 as threshold, and screened 752 differentially expressed genes. I used the same R package and threshold, but I screened only 625 differential genes. What causes this?. Why do the same data use the same analysis method, and the result will be different? 

This is what the author mentioned in the literature about data processing :All the raw data and annotation files were downloaded for further analysis.The raw data were firstly preprocessed using the Affy package in R language [ 12 , 13 ] . Then, the DEGs in glomeruli tissues between patients with DN and healthy controls were analyzed by limma package in R [ 14 ] . The multiple testing correction was performed using a beniamini-Hochberg false discovery rate (HB FDR) [ 15 ] . Fold change (FC) of the expression of individual gene was also observed for diff erential expression test. The DEGs with FDR < 0.01 and |log FC| > 1 were considered to be signifi cant.The set of data used is GSE1009.

This is the code that I use :

library(affy)

library(tcltk)

dir <- tk_choose.dir(caption = "Select folder")

cel.files <- list.files(path = dir, pattern = ".+\\.cel$", ignore.case = TRUE,full.names = TRUE, recursive = TRUE)

basename(cel.files)

data.raw <- ReadAffy(filenames = cel.files)

sampleNames(data.raw) <- paste("CHIP", 1:length(cel.files), sep = "-")

sampleNames(data.raw)

pData(data.raw)$treatment <- rep(c("N", "T"), each=3)

pData(data.raw)

eset.rma <- rma(data.raw)

emat.rma.log2 <- exprs(eset.rma)
head(emat.rma.log2, 1)

results.rma <- data.frame((emat.rma.log2[,c(1,4)] + emat.rma.log2[,c(2,5)] + emat.rma.log2[,c(3,6)])/3)  

results.rma$fc <- results.rma[,2] - results.rma[,1] 
head(results.rma, 2)

data.mas5calls <- mas5calls(data.raw)

eset.mas5calls <- exprs(data.mas5calls)

head(eset.mas5calls)

AP <- apply(eset.mas5calls, 1, function(x)any(x=="P"))

present.probes <- names(AP[AP])

paste(length(present.probes),"/",length(AP))

results.present <- results.rma[present.probes,]

library(limma)

treatment <- factor(pData(eset.rma)$treatment)

design <- model.matrix(~ 0 + treatment)

colnames(design) <- c("N", "T")

design

contrast.matrix <- makeContrasts(T-N, levels=design)

contrast.matrix

fit <- lmFit(eset.rma[present.probes,], design)

fit2 <- contrasts.fit(fit, contrast.matrix)

fit2 <- eBayes(fit2)

results.lim <- topTable(fit2, coef=1, adjust.method="fdr", p.value=0.05, lfc=1, number=30000)

Thangk you 

 

limma affy • 1.3k views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 8 hours ago
United States

You aren't using the same package as the authors. If you read the paper they link to in their GEO submission, you will find that they used Affy's MAS5.0 software, not the Bioconductor affy package, and certainly not RMA, as you have. If you use RMA and they use MAS5.0, is it surprising that you get different results?

ADD COMMENT
0
Entering edit mode

After re-reading your post I see that you are not trying to reproduce the original author's results, but somebody else's result who used those data. Unless the author states exactly what was done, or supplies the code (and you use the same exact versions of the packages), it isn't surprising that you get different results. For instance, you are filtering using MAS5.0 calls. Is that consistent with what the other person did?
 

ADD REPLY
0
Entering edit mode

Thank you very much for your reply, and I checked the data later. I think the reason for the different results may be different ways of data preprocessing. Although the number of differential genes is different, I feel that my approach seems to be all right. What do you think?  

ADD REPLY
0
Entering edit mode

It seems OK, except for the part where you use lfc in your call to topTable. If you want to include a fold-change criterion you are better off using the treat function.

ADD REPLY
0
Entering edit mode
Aleksandra • 0
@aleksandra-13756
Last seen 6.7 years ago

Did the Author describe which method of data preprocessing was used? If he only wrote that "The raw data were firstly preprocessed using the Affy package in R language [ 12 , 13 ]", it is very imprecise. Did he use rma or mas? Meybe he just used a different preprocessing method than you.

ADD COMMENT
0
Entering edit mode
Axel Klenk ★ 1.0k
@axel-klenk-3224
Last seen 19 hours ago
UPF, Barcelona, Spain

 

The package version(s) are important, too, as the software evolves.

System libraries such as BLAS and LAPACK may cause differences.

Exactly reproducing such an analysis is not easy.

ADD COMMENT

Login before adding your answer.

Traffic: 683 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6