Question

EdgeR for differential analysis between two cell lines without replication

0

Entering edit mode

Biologist ▴ 110

@biologist-9801

Last seen 4.1 years ago

I have data like following: 56318 genes and two cell-lines with counts data.

head(counts)[1:5,]

             Name Description       Cell-line1     Cell-line2
1 ENSG00000223972     DDX11L1            1               2
2 ENSG00000227232      WASH7P         1639            1138
3 ENSG00000243485  MIR1302-11            7               1
4 ENSG00000237613     FAM138A            0               2
5 ENSG00000268020      OR4G4P            0               0

library(edgeR)
y <- DGEList(counts = counts[,3:4], genes = counts[,2])

o <- order(rowSums(y$counts), decreasing=TRUE)
y <- y[o,]
d <- duplicated(y$genes$genes)
y <- y[!d,]
nrow(y)
[1] 54354

y$samples$lib.size <- colSums(y$counts)
y <- calcNormFactors(y)
y$samples
           group  lib.size norm.factors
Cell-line1     1 153195968     0.969847
Cell-line2     1  96981415     1.031090

Patient <- factor(c("Cell-line1", "Cell-line2"))
Tissue <- factor(c("BREAST1","BREAST2"))
data.frame(Sample=colnames(y),Patient,Tissue)

       Sample     Patient   Tissue
1    Cell-line1  Cell-line1 BREAST1
2    Cell-line2  Cell-line2 BREAST2

design <- model.matrix(~Patient+Tissue)

rownames(design) <- colnames(y)
design

y <- estimateDisp(y, design)
Warning message:
In estimateDisp.default(y = y$counts, design = design, group = group,  :
  No residual df: setting dispersion to NA

Can anyone please help me out whats wrong with data or code?

edger differential gene expression rnaseq • 2.7k views

ADD COMMENT • link updated 5.8 years ago by Gordon Smyth 50k • written 5.8 years ago by Biologist ▴ 110

0

Entering edit mode

This is not a DESeq2 question so I’ve removed the tag.

ADD REPLY • link 5.8 years ago Michael Love 41k

0

Entering edit mode

Hi Michael,

I would like to know whether I can do differential analysis between two cell-lines with Deseq2?

ADD REPLY • link 5.8 years ago Biologist ▴ 110

1

Entering edit mode

DESeq2 needs replicates for performing differential analysis. It will give you a warning/error if you try to analyze data without replicates.

ADD REPLY • link 5.8 years ago Michael Love 41k

score 1 · Answer 1 · 2018-07-10

1

Entering edit mode

Gordon Smyth 50k

@gordon-smyth

Last seen 27 minutes ago

WEHI, Melbourne, Australia

See Section 2.11 of the edgeR User's Guide "What to do if you have no replicates".

You asked the same question a few months ago and got the same answer: Differential analysis between single sample vs single sample (control vs treatment) with no replicates

ADD COMMENT • link 5.8 years ago Gordon Smyth 50k

0

Entering edit mode

Hi Gordon,

Thank you. I followed the tutorial and did the analysis.

I have raw counts data of 72 genes for two cell-lines in dataframe "df". Three columns. First columns has genes and other columns are cell-lines.

df <- data.frame(df[,-1], row.names=df[,1])

library(edgeR)
y <- DGEList(counts=df[,2:3], genes=tin[,1], group = 1:2)
y <- calcNormFactors(y,method = "TMM")

y$samples

                group lib.size norm.factors
AU565_BREAST        1     5101    0.8226359
MDAMB468_BREAST     2     6144    1.2156047
bcv <- 0.1

et <- exactTest(y, dispersion=bcv^2)
topTags(et,n=100)
tab <- topTags(et,n=Inf)
summary(decideTestsDGE(et))

       1+2
Down     8
NotSig  54
Up      10
keep <- tab$table$FDR <= 0.05
tab$table[keep,]

The summary shows 1+2 How to say that 10 genes are Upregulated in which cell-line?

ADD REPLY • link 5.8 years ago Biologist ▴ 110

1

Entering edit mode

The column heading is supposed to be "2-1" meaning group2 vs group1. So the 10 DE genes are up in the MDAMB468 cell line.

You can find out what exactTests() does by reading the help page ?exactTest. By default it compares the 2nd group to the 1st.