Question

Want help in limma analysis

0

Entering edit mode

anjan.payra • 0

@anjanpayra-13322

Last seen 4.7 years ago

library(edgeR)
library(limma)
seqdata <- read.delim("D:\\lung.txt", stringsAsFactors = TRUE)
sampleinfo <- read.delim("D:\\SampleInfo.txt")
countdata <- seqdata[,-(1:2)]
rownames(countdata) <- seqdata[,1]
group <- paste(sampleinfo$Status)
group <- factor(group)
design <- model.matrix(~ 0 + group)
colnames(design) <- levels(group)
myCPM <- cpm(countdata)
thresh <- myCPM > 0.5
keep <- rowSums(thresh) >= 2
counts.keep <- countdata[keep,]
summary(keep)
y <- DGEList(counts.keep)
v <- voom(y,design,plot = TRUE)
fit <- lmFit(v)
cont.matrix <- makeContrasts(B.NorVDis=groupNormal - groupDisease,levels=design)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont <- eBayes(fit.cont)
summa.fit <- decideTests(fit.cont)
topTable(fit.cont,coef=1,sort.by="p")

But in the topTable result genes names are not appeared in left col and I want to write the result in txt file. My files are lung.txt

Gene     LN64     LN66     LN67     LN69     LN70     LN71     LN72B    LN73     LN74     LN75
GABRA3   108      108      170.8    88       97       89.1     77.2     198.4    180.6    106.2
OMD      93       103.5    100.5    72.8     118      118      54.9     110.7    243.9    164
GS3686   152.8    161.4    292.1    109.8    215.8    325.9    173.1    282.2    263.2    264
SEMA3C   387.3    347.2    702.8    643.8    368.9    429.4    622.7    507.9    896.6    529
GML      32       3.9      -1.7     19.3     39       -4       -1.8     -16.4    12.5     3
MKNK1    237.4    248.7    377.2    363      390.9    318.8    502.8    323.6    295      381.6
OGG1     -57.3    40       -34.4    13.7     2.1      57       45       23.5     43.3     70.4
VRK1     119      35.5     132.2    151.9    110.5    112.6    69.4     180.4    188.1    118.9
VRK2     214      186      273      193.6    237      233.4    153.6    219.4    294      223
RES4-22  642.8    424.1    1070.3   599.3    552.4    578.8    515.9    634      930.5    873.9
SH3BP2   152.8    110.5    263      118.9    158      135.4    112.2    150.5    133.2    156.9
RES4-25  24.6     55.1     -10      -0.4     106.2    22.7     52.7     10       -7.8     23.9
RNF4     587.8    664.6    767.4    570.2    610.7    531.6    448.8    473.7    461.9    571.5

SampleInfo.txt

SampleName            Status
LN64       Normal
LN66       Normal
LN67       Normal
LN69       Normal
LN70       Normal
LN71       Disease
LN72B     Disease
LN73       Disease
LN74       Disease
LN75       Disease

Request to kindly help me.Thanks.

limma • 908 views

ADD COMMENT • link updated 6.8 years ago by Gordon Smyth 50k • written 6.8 years ago by anjan.payra • 0

score 0 · Answer 1 · 2017-06-23

limma and edgeR will preserve row names. If you set the rownames at the start, there shouldn't be a problem.

y <- matrix(rnbinom(10000,mu=5,size=2),ncol=4)
rownames(y) <- paste0("Gene", seq_len(nrow(y))) # Setting row names here.
design <- model.matrix(~0 + factor(rep(1:2,each=2)))

We then apply the standard pipeline to this (mostly copying from your post):

d <- DGEList(y)
d <- calcNormFactors(d)
v <- voom(d, design)
fit <- lmFit(v)
cont.matrix <- c(1, -1)
fit.cont <- contrasts.fit(fit, cont.matrix)
fit.cont <- eBayes(fit.cont)
topTable(fit.cont,coef=1,sort.by="p")

The final topTable call gives me a table with row names from the original count matrix. If that isn't the case for you, then you're not assigning the row names to your count matrix properly.

On another note, lung.txt doesn't seem to contain count values, given that some values are negative. voom requires counts, see A: Differential expression of RNA-seq data using limma and voom().

score 0 · Answer 2 · 2017-06-24

The first step to analysing gene expression data is to understand a little bit about how it was produced. You are trying to analyse this data as if it was RNA-seq data. The data you show is however from Beer et al (Nature Medicine, 2002). It was produced using Affymetrix microarrays and it was deposited to the GEO repository as series GSE68571.

The expression values are actually unlogged intensites so analysing using cpm() and voom() is not at all appropriate. The values need to be background corrected and then logged before an analysis can be done.

Another serious issue is that the 10 samples you are analyzing are all normal tissue samples. None of them are disease samples, so the comparison you are doing doesn't seem to be of any biological interest. The full dataset contains 86 disease samples and 10 normal tissue samples, but you seem to be analysing only the normal samples.

I wonder where you got this data from and what you are trying to achieve. Can you tell us more about what your aims are if you want some more advice about how a correct analysis would be done?