Question

Log2FC values very small with SCAN

0

Entering edit mode

Vani ▴ 20

@vani-8145

Last seen 8.3 years ago

United States

Hi,

I am using the SCAN method to normalize several geo datasets. The resulting Log2FC values of the normalized eset are very small (between -.5 and .5). Is this normal? Not sure why the values are so small.

Please advise.

scan Log2FC • 1.8k views

ADD COMMENT • link updated 8.7 years ago by Gordon Smyth 50k • written 8.8 years ago by Vani ▴ 20

score 1 · Answer 1 · 2015-07-23

I downloaded the series data matrix for GSE21610 (which is already normalized using MAS5) and did a quick analysis using limma. There are plenty of large fold changes, some with log2FC > 3.

Even using SCAN normalization, there are a number of large fold changes, so your apparent claim that the log2FC are all between -0.5 and 0.5 is not actually true.

Whatever normalization method you use, the data analysis seems to be me to require more attention than this. This study has three possible values for the disease status: "none", "dilated cardiomyopathy" and "ischemic cardiomyopathy". There are other variables that should be adjusted for in the limma linear model (particularly gender and age). There are other important analysis steps that should be done to address data quality, especially filtering out unexpressed probes. I would rather see you giving attention to these fundamental analysis issues instead of worrying so much about the size of the fold changes.

Gordon Smyth · Answer 2 · 2015-07-17

0

Entering edit mode

Stephen Piccolo ▴ 590

@stephen-piccolo-6761

Last seen 3.6 years ago

United States

Hi Vani,

It's hard to know what could cause this without knowing more about the data set and analysis you are doing. Can you provide a few more details (array type, sample size, method of calculating Log2FC, etc.)? Also, have you tried it with any other normalization methods?

Thanks,

-Steve

ADD COMMENT • link 8.8 years ago Stephen Piccolo ▴ 590

0

Entering edit mode

I am getting small values for the Affymetrix Human Genome U133 Plus 2.0 Array. The sample size is around 68. I am using limma's lmFit and toptable to generate the log2FC. I tried FRMA and the values ranged from -1.3 to 1.3.

Here is my code:

#Load data using InSilicoDb
eset21610 <- getDataset("GSE21610","GPL570",  format = "CURESET",norm = "SCAN", features = "GENE")

design1 <- model.matrix(~ Heart_Failure, pData(eset21610))

afterLimma <- lmFit(eset21610, design = design1)

e4 <- eBayes(afterLimma)

impdata <- topTable(e4,number = 19528,sort.by="logFC")

plot(impdata$logFC, -log10(impdata$P.Value),
   xlim=c(-.6, .6), ylim=c(-1, 10),
   xlab="log2 fold change", ylab="-log10 p-value")

ADD REPLY • link updated 8.7 years ago by Gordon Smyth 50k • written 8.8 years ago by Vani ▴ 20

0

Entering edit mode

Vani,

Sorry for the late reply. I looked at the data and did some simple simulations to make sure I understand what is going on. It appears this is because the variance is larger for the fRMA data than for the SCAN data. I don't know enough about how limma works to know how this affects the logFC values. Perhaps the authors of that tool could shed some light on this...

ADD REPLY • link 8.7 years ago Stephen Piccolo ▴ 590

0

Entering edit mode

limma just computes fold changes for the data it is given, and variances do not enter into the calculation. If there is problem here, it is at the normalization stage.

ADD REPLY • link 8.7 years ago Gordon Smyth 50k