Question: Foldchange values Limma
1
gravatar for susana.amaral.teixeira
4 months ago by
susana.amaral.teixeira10 wrote:

Hi, everyone. I have performed my in Limma package because there are nested effect interaction (random effect) in statistical model. Looking to my results the log foldchange seems to be wrong to me. Some genes have too high or too low LogFC values such as -14997.66667 with p-adjust 0.02518022, for example. We don't have experience with this package, we would like to know if this is normal.

Thanks.

rnaseq limma foldchange • 231 views
ADD COMMENTlink modified 3 months ago • written 4 months ago by susana.amaral.teixeira10
1

What is the experimental context? What is the structure of the data? What limma code did you use? You need to add more detail to your post, read the posting guide.

ADD REPLYlink written 4 months ago by Aaron Lun24k

In my project, we performed a RNAseq of pig embryos, whose mothers were supplemented (ARG) or not with arginine (CONT) on diet. We used the follow Limma code in R:

samples <- read.table("25d.txt",header=T, as.is=T)
#samples: 
#SAMPLE_ID  Mother  Tissue  Sex Treatment
#P8175-25DA-E1  P-8175  embryos M   ARG
#P8191-25DA-E5  P-8191  embryos M   ARG
#P8192-25DC-E1  P-8192  embryos M   CONT
#P8192-25DC-E5  P-8192  embryos  M  CONT
#P8206-25DC-E1  P-8206  embryos  M  CONT
#P8285-25DC-E2  P-8285  embryos M   CONT
#P8285-25DC-E5  P-8285  embryos M   CONT
#P8175-25DA-E2  P-8175  embryos F   ARG
#P8175-25DA-E5  P-8175  embryos F   ARG
#P8191-25DA-E1  P-8191  embryos F   ARG
#P8191-25DA-E2  P-8191  embryos F   ARG
#P8192-25DC-E2  P-8192  embryos F   CONT
#P8193-25DA-E1  P-8193  embryos F   ARG
#P8193-25DA-E2  P-8193  embryos F   ARG
#P8193-25DA-E5  P-8193  embryos F   ARG
#P8206-25DC-E2  P-8206  embryos  F  CONT
#P8206-25DC-E5  P-8206  embryos F   CONT
#P8285-25DC-E1  P-8285  embryos F   CONT

countf <- file.path(folder, paste0(samples$SAMPLE_ID, '_ReadsPerGene.counts'))

counts = readDGE(countf)$counts #Edge function to load the counts data

# 2. Filter weakly expressed and noninformative (e.g., non-aligned) features:
noint = rownames(counts) %in% c("N_multimapping","N_noFeature","N_ambiguous")
LogCPM <- cpm(counts, log=TRUE, prior.count=3)

write.csv(LogCPM, "log_cpm_25d.csv", quote=FALSE, row.names = TRUE)
mean(colSums(counts[!noint,])/colSums(counts))

# In edgeR, it is recommended to remove features without at least 1 read per million in n of the samples, where n is the size of the smallest group of replicates, 

keep = rowSums(LogCPM >1) >=3 #& !noint
counts = counts[keep,]

sample_id = apply(samples[,c("SAMPLE_ID", "Treatment")], 
 1, function(x) paste(na.exclude(x), collapse = ".")) 
condition = apply(samples[,c("Tissue", "Treatment")], 
 1, function(x) paste(na.exclude(x), collapse = ".")) 

dge.list <- DGEList(counts=counts)

dge.norm <- calcNormFactors(dge.list)

samples <- data.frame(SAMPLE_ID = samples$SAMPLE_ID, Mother = as.factor(samples$Mother), Tissue = as.factor(samples$Tissue), Sex = as.factor(samples$Sex), Treatment = as.factor(samples$Treatment))

design3 <- model.matrix(~0+ Treatment + Treatment:Mother , samples)
colnames(design3) <- c("TreatmentARG", "TreatmentCONT" ,  
 "TreatmentARG.P.8191", "TreatmentCONT.P.8191", "TreatmentARG.P.8192", "TreatmentCONT.P.8192",
 "TreatmentARG.P.8193", "TreatmentCONT.P.8193", "TreatmentARG.P.8206", "TreatmentCONT.P.8206", 
 "TreatmentARG.P.8285", "TreatmentCONT.P.8285")

keep.filtered <- filterByExpr(dge.list, design3)
dge <- dge.norm[keep.filtered,keep.lib.sizes=FALSE]

dge <- as.matrix(dge)

dupC <- duplicateCorrelation(object = dge, design = design3, block = samples$Mother)

fit <- lmFit(dge, design3, correlation = dupC$consensus)

cm <- makeContrasts(Treatment=TreatmentARG-TreatmentCONT,levels = design3)

fit2 <- contrasts.fit(fit, cm)
fit2.bayes <- eBayes(fit2)

tab3 <- topTable(fit2.bayes, coef = 'Treatment')

                           logFC        AveExpr     t       P.Value     adj.P.Val   B
 #ENSSSCG00000018065    -14997.66667    16273.0556  -8.162678   1.84E-06    0.02518022  -4.336885   
 #ENSSSCG00000007849    137 206.4444    6.520178    1.98E-05    0.11455024  -4.361206
 #ENSSSCG00000010817    48.66667    100.5       6.36721     2.52E-05    0.11455024  -4.364157
 #ENSSSCG00000034356    64.33333    20      6.071297    4.03E-05    0.13766863  -4.37029
 #ENSSSCG00000002620    68.66667    152.1111    5.67904     7.67E-05    0.19861386  -4.379368
 #ENSSSCG00000007914    110 237.0556    5.60247     8.72E-05    0.19861386  -4.381278
 #ENSSSCG00000018080    -2619.33333 1400.6667   -5.438972   1.15E-04    0.22444833  -4.385518
 #ENSSSCG00000039715    48.33333    112.3889    5.002999    2.45E-04    0.41790153  -4.397986
 #ENSSSCG00000003660    437.33333   1527        4.799328    3.51E-04    0.50011588  -4.40444
 #ENSSSCG00000028646    353.33333   822.1111    4.776304    3.66E-04    0.50011588  -4.405197
ADD REPLYlink written 4 months ago by susana.amaral.teixeira10
Answer: Foldchange values Limma
3
gravatar for Gordon Smyth
4 months ago by
Gordon Smyth37k
Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia
Gordon Smyth37k wrote:

You're right, the very large logFC values in your analysis are not at all normal. They have arisen because you have input raw RNA-seq counts to lmFit instead of normalized log-intensity values.

It is not correct to input a matrix of RNA-seq read counts directly to lmFit. You have to first transform the counts to logCPM values using either the voom function or the cpm function. See Chapter 15 "RNA-seq Data" of the limma User's Guide.

There so many issues with your code that it would be better and quicker if I gave you a correct sequence code sequence rather than trying to comment on lots of individual things. The correct analysis is actually simpler than your code so far.

Read in data:

samples <- read.delim("25d.txt", stringsAsFactors=FALSE)
countf <- file.path(folder, paste0(samples$SAMPLE_ID, '_ReadsPerGene.counts'))
dge <-   readDGE(countf, group=samples$Treatment)
noint <- rownames(dge) %in% c("N_multimapping","N_noFeature","N_ambiguous")
dge <- dge[!noint,]

Filter and convert to logCPM:

keep <- filterByExpr(dge)
dge <- dge[keep,,keep.lib.sizes=FALSE]
dge <- calcNormFactors(dge)
logCPM <- cpm(dge, log=TRUE, prior.count=2)

Random effect correlation:

design <- model.matrix(~Treatment, data=samples)
dupC <- duplicateCorrelation(logCPM, design, block=samples$Mother)
dupC$consensus

Differential expression analysis:

fit <- lmFit(logCPM, design, block=samples$Mother, correlation=dupC$consensus)
fit <- eBayes(fit, trend=TRUE, robust=TRUE)
topTable(fit, coef=2)
ADD COMMENTlink modified 4 months ago • written 4 months ago by Gordon Smyth37k
Answer: Foldchange values Limma
0
gravatar for susana.amaral.teixeira
3 months ago by
susana.amaral.teixeira10 wrote:

Thanks a lot for your attention and tips. I did your changes that you suggested and folds changes were corrects. Your support is very importante in our analises, main when used the programas for complex models. I'm very grateful!

ADD COMMENTlink written 3 months ago by susana.amaral.teixeira10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 284 users visited in the last hour