Hi all,
I have RNA-seq data from twins of which one of the pair is sick (n=5) and the other one is healthy (n=5). In addition, I have six healthy controls that are not related to the twins.
I wish to look for differential expression (a) between sick twins compared to their healthy twin pairs, and (b) between sick twins and healthy controls including the healthy twins and unrelated healthy controls.
For part (a) I take that blocking might be most suitable as most twin pairs cluster tightly in an MDS plot. The code I wrote based on limma's manual is the following, please correct if you point out something peculiar:
pairs <- factor(expDesign$Pair) #Levels "P1".."P5" or "unpaired"
disease <- factor(expDesign$Disease) #Levels "Sick" and "Healthy"
des <- model.matrix(~pairs+disease) #Pair info to block
y <- DGEList(counts=data)
keep<- rowSums(cpm(y)>1) >= 6
y <- y[keep, ]
y <- calcNormFactors(y, method="TMM")
v <- voom(y, des, plot=TRUE)
topTable(fit, coef=2, sort.by = "p")
For aim (b) I guess limma's duplicateCorrelation() could be appropriate. However I have doubts in my skills writing the code (below) and would appreciate any advice on it. Also all comments for what would be best practise to analyse this kind of data structure are warmly welcome!
dge <- DGEList(counts=data)
keep<- rowSums(cpm(dge)>1) >= 6
dge <- dge[keep, ]
dge$samples$lib.size <- colSums(dge$counts)
dge <- calcNormFactors(dge)
design <- model.matrix(~expDesign$Disease)
v <- voom(dge, design, plot=TRUE)
corfit <- duplicateCorrelation(v, design, block=expDesign$Pair)
corfit$consensus #Gives 0.2208816
fit <- lmFit(v, design, block=expDesign$Pair, correlation=corfit$consensus)
fit <- eBayes(fit)
topTable(fit, coef=2, sort.by = "p")
Thanks a lot!
Many thanks Aaron. Indeed there's a part of code missing between voom and topTable, copy-paste ...
For the first analysis I forgot to mention that actually the unpaired samples were removed prior analysis, sorry for that.
I'll try the iteration for the second part asap, thanks for the tip!
Kalle