Please keep the discussion on the Bioconductor mailing list.
On Thu, 24 Oct 2013, Sindre Lee wrote:
> On 2013-10-24 00:40, Gordon K Smyth wrote:
>> Dear Sindre,
>>
>> I think that you want to find genes for which the change from time1
>> to time2 is different in the diabetic patients vs the healthy
>> patients.
>>
>> In your experiment, you have two types of factors. The time points
>> are within patients whereas the desease status is between patients.
>> See section 3.5 of the edgeR User's Guide.
>>
>> Setting up the design matrix for such an experiment is a little
>> tricky, not because of the edgeR but because of the limitations of
>> model.matrix() in R.
>>
>> I suggest you do it like this. Set up a variable for timepoint 2
for
>> healthy patients only:
>>
>> time2.healthy <- as.numeric(status=="Healthy" & timepoints==2)
>>
>> Same for diabetics:
>>
>> time2.diabetic <- as.numeric(status=="Diabetic" & timepoints==2)
>>
>> Then
>>
>> design <- model.matrix(~patients+time2.healthy+time2.diabetic)
>>
>> The patients term will pick up the baseline expression level for
each
>> patient at time1. time2.healthy will pick up the time2 (treatment)
>> effect for healthy patients and time2.diabetic will do the same for
>> diabetics.
>>
>> Finally, you will want to test the time2.diabetic-time2.healthy
contrast.
>>
>> Best wishes
>> Gordon
>>
>>> Date: Tue, 22 Oct 2013 20:03:32 +0200
>>> From: Sindre Lee <sindre.lee at="" studmed.uio.no="">
>>> To: <bioconductor at="" r-project.org="">
>>> Subject: [BioC] edgeR with two different groups (unpaired) at two
time
>>> points (paired): Need some input
>>>
>>> Hi!
>>> The edgeR manual is quite nice, but I'm not quite sure if I'm on
the
>>> right track..
>>>
>>> The question to answer is: "Is there any significantly changed
genes in
>>> diabetics at time point 2, compared to healthy?"
>>>
>>> Heres my codes:
>>>
>>> #Making the design matrix
>>>
>>> status <- factor(c(rep("Healthy",26), rep("Diabetic",22)),
>>> levels=c("Healthy","Diabetic"))
>>> patients <-
>>> factor(c(88,87,86,83,82,81,75,72,71,70,13,08,01,88,87,86,83,82,81,
75,72,71,70,13,08,01,79,77,76,73,67,62,61,55,53,21,04,79,77,76,73,67,6
2,61,55,53,21,04))
>>> timepoints =
>>> as.factor(c(1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2,2,1,
1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2))
>>> design <- model.matrix(~patients + timepoints*status)
>>>> colnames(design)
>>> [1] "(Intercept)" "patients4"
>>> [3] "patients8" "patients13"
>>> [5] "patients21" "patients53"
>>> [7] "patients55" "patients61"
>>> [9] "patients62" "patients67"
>>> [11] "patients70" "patients71"
>>> [13] "patients72" "patients73"
>>> [15] "patients75" "patients76"
>>> [17] "patients77" "patients79"
>>> [19] "patients81" "patients82"
>>> [21] "patients83" "patients86"
>>> [23] "patients87" "patients88"
>>> [25] "timepoints2" "statusDiabetic"
>>> [27] "timepoints2:statusDiabetic"
>>>
>>> #Reading in HTSeq data
>>>
>>> description <- c("test");
>>> fileNames <-
>>>
list.files(path="/Volumes/timemachine/HTseq_DEseq2",pattern="*.txt");
>>> files <- sort(fileNames,decreasing=TRUE);
>>> targets <- data.frame(files=files, group=status,
>>> description=description);
>>> library(edgeR);
>>> d <-
>>> readDGE(targets,path="/Volumes/timemachine/HTseq_DEseq2",skip=5,co
mment.char="!");
>>> colnames(d)
>>> <-(c("N288","N287","N286","N283","N282","N281","N275","N272","N271
","N270","N213","N208","N201","N188","187","N186","N183","N182","N181"
,"N175","N172","N171","N170","N113","N108","N101","D279","D277","D276"
,"D273","D267","D262","D261","D255","D253","D221","D204","D179","D177"
,"D176","D173","D167","D162","D161","D155","D153","D121","D104"));
>>> d$samples;
>>> dim(d)
>>>
>>> #Normalization
>>>
>>> d <- (calcNormFactors(d,method="RLE"));
>>>
>>> #Estimating the dispersion:
>>>
>>> d <- estimateCommonDisp(d,verbose=TRUE)
>>> Disp = 0.05823 , BCV = 0.2413
>>> d <- estimateTagwiseDisp(d,trend="none")
>>>
>>> #diffeksp
>>>
>>>> fit <- glmFit(d,design)
>>> Error in glmFit.default(y = y$counts, design = design, dispersion
=
>>> dispersion, :
>>> Design matrix not of full rank. The following coefficients not
>>> estimable:
>>> statusDiabetic
>>>
>>> Can someone tell me why I get this error?
>>>
>>> Thanks alot!
>>>
>>
>
> Will this do?
>
> lrt <- glmLRT(fit,
> contrast=c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,1))
>
> Coefficient: -1*time2.healthy 1*time2.diabetic
>
Yes, that is correct.
To find genes DE at time2 in healthy patients you can use
lrt <- glmLRT(fit, coef=25)
To find genes DE at time2 in diabetic patients you can use
lrt <- glmLRT(fit, coef=26)
To find genes whose response differs in diabetic patients vs healthy,
the
call that you give is correct. You could construct this by:
con <- makeContrasts(time2.diabetic-time2.healthy, levels=design)
lrt <- glmLRT(fit, contrast=con)
Or alternatively
con <- rep(0,26)
con[26] <- 1
con[25] <- -1
Best wishes
Gordon
> design <- model.matrix(~patients+time2.healthy+time2.diabetic)
> colnames(design)
[1] "(Intercept)" "patients4" "patients8" "patients13"
[5] "patients21" "patients53" "patients55" "patients61"
[9] "patients62" "patients67" "patients70" "patients71"
[13] "patients72" "patients73" "patients75" "patients76"
[17] "patients77" "patients79" "patients81" "patients82"
[21] "patients83" "patients86" "patients87" "patients88"
[25] "time2.healthy" "time2.diabetic"
>
______________________________________________________________________
The information in this email is confidential and
intend...{{dropped:4}}