Question about extracting co-efficients for pairwise comparisons in edgeR
1
0
Entering edit mode
Sabiha ▴ 20
@7f93ecd8
Last seen 7 months ago
United States

Hi,

I am analyzing RNA-Seq dataset using EdgeR package, I have question while extracting co-efficient for pairwise comparisons to detect genes that are differentially expressed between group after performing dispersion estimation, fitting the model (steps below). Just checking if I am I doing it rightly? Alternatively, is there another way compare the contrasts?


#>    Samples Patient Condition_Timepoint
#> 1       S1      P1              mo_6hr
#> 2       S2      P2              mo_6hr
#> 3       S3      P3              mo_6hr
#> 4       S4      P4              mo_6hr
#> 5       S5      P5              mo_6hr
#> 6       S6      P6              mo_6hr
#> 7       S7      P1             Inf_6hr
#> 8       S8      P2             Inf_6hr
#> 9       S9      P3             Inf_6hr
#> 10     S10      P4             Inf_6hr
#> 11     S11      P5             Inf_6hr
#> 12     S12      P6             Inf_6hr
#> 13     S13      P1             mo_24hr
#> 14     S14      P2             mo_24hr
#> 15     S15      P3             mo_24hr
#> 16     S16      P4             mo_24hr
#> 17     S17      P5             mo_24hr
#> 18     S18      P6             mo_24hr
#> 19     S19      P1            Inf_24hr
#> 20     S20      P2            Inf_24hr
#> 21     S21      P3            Inf_24hr
#> 22     S22      P4            Inf_24hr
#> 23     S23      P5            Inf_24hr
#> 24     S24      P6            Inf_24hr
#> 25     S25      P1             mo_48hr
#> 26     S26      P2             mo_48hr
#> 27     S27      P3             mo_48hr
#> 28     S28      P4             mo_48hr
#> 29     S29      P5             mo_48hr
#> 30     S30      P6             mo_48hr
#> 31     S31      P1            Inf_48hr
#> 32     S32      P2            Inf_48hr
#> 33     S33      P3            Inf_48hr
#> 34     S34      P4            Inf_48hr
#> 35     S35      P5            Inf_48hr
#> 36     S36      P6            Inf_48hr

Create model matrix

Patient_ID_v1 <- factor(sample_metadata$Patient)
Condition.Treatment <- factor(sample_metadata$Condition_Timepoint, levels=c("mo_6hr",   "Inf_6hr",  "mo_24hr",  "Inf_24hr", "mo_48hr",  "Inf_48hr"))
design.Condition.Treatment <- model.matrix(~Patient_ID_v1+Condition.Treatment)

Dispersion estimation

y.Treatment.v1 <- estimateDisp(y.Treatment,design.Condition.Treatment, robust=TRUE)

Fit the model

fit.Treatment <- glmQLFit(y.Treatment.v1,design.Condition.Treatment, robust = TRUE)
colnames(fit.Treatment$coefficients)
 [1] "(Intercept)"                 "Patient_ID_v1P2"            
 [3] "Patient_ID_v1P3"             "Patient_ID_v1P4"            
 [5] "Patient_ID_v1P5"             "Patient_ID_v1P6"            
 [7] "Condition.TreatmentInf_6hr"  "Condition.Treatmentmo_24hr" 
 [9] "Condition.TreatmentInf_24hr" "Condition.Treatmentmo_48hr" 
[11] "Condition.TreatmentInf_48hr"

To detect genes that are differentially expressed in Inf6_vs_mo6

qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,1,0,0,0,0))
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")

To detect genes that are differentially expressed in Inf24_vs_mo24

qlf.Inf24_vs_mo24 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,0,-1,1,0,0))
topTags(qlf.Inf24_vs_mo24, n=10, adjust.method = "BH", sort.by = "PValue")

To detect genes that are differentially expressed in Inf48_vs_mo48

qlf.Inf48_vs_mo48 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,0,0,0,-1,1))
topTags(qlf.Inf48_vs_mo48, n=10, adjust.method = "BH", sort.by = "PValue")

In addition to the above to detect genes that are differentially expressed in Inf6_vs_mo6, can I also make like below?

qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, coef="Condition.TreatmentInf_6hr")
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")
qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, coef=7)
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")

Best Regards,

Sabiha

limma fit design RNASeq edgeR • 486 views
ADD COMMENT
2
Entering edit mode
@gordon-smyth
Last seen 1 hour ago
WEHI, Melbourne, Australia

It all looks correct. Yes, the three ways you have used to specify the Inf6 vs mo6 comparison are all equivalent.

ADD COMMENT
0
Entering edit mode

Gordon Smyth thank you very much. "Condition.Treatmentmo_6hr" is missing in the number of coefficients, so I was bit concerned if I end up picking wrong one because pairwise coefficients were available for Inf24_vs_mo24 and Inf48_vs_mo48 comparisons but missing for Inf6_vs_mo6.

Another question about pairwise comparisons, simply by exporting logcpm values from edgeR, and using it in limma is also a feasible and flexible approach. In the limma-trend approach, the counts are converted to logCPM values using edgeR's cpm function: Does the below makes sense?

As per limma manual: In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million [logCPM] and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. The precision weights approach is called "voom" and the prior trend approach is called "limma-trend".

y.Treatment <- calcNormFactors(y.Treatment, method = "TMM")
cpm_with_log2 = cpm(y.Treatment, prior.count=1, log=TRUE)


library(limma)
Condition.Treatment <- factor(sample_metadata$Condition_Timepoint, levels=c("mo_6hr",   "Inf_6hr",  "mo_24hr",  "Inf_24hr", "mo_48hr",  "Inf_48hr"))
design_Condition_Timepoint <- model.matrix(~0+Condition_Timepoint)

## Then we estimate the correlation between measurements made on the same subject:
corfit <- duplicateCorrelation(cpm_with_log2, design_Condition_Timepoint, block= sample_metadata$Patient) 

## Then this inter-subject correlation is input into the linear model fit:
fit_Condition_Timepoint <- lmFit(cpm_with_log2, design_Condition_Timepoint, block=sample_metadata$Donor, correlation=corfit$consensus)  

## Now we can make any comparisons between the experimental conditions in the usual way:

Cont.matrix_Condition_Timepoint <- makeContrasts(
  "Inf_6hr_vs_mo_6hr" = Inf_6hr - mo_6hr,
  "Inf_24hr_vs_mo_24hr" = Inf_24hr - mo_24hr,
  "Inf_48hr_vs_mo_48hr" = Inf_48hr - mo_48hr,
  levels=design_Condition_Timepoint)

## Then compute these contrasts and moderated t-tests:
fit_2_Condition_Timepoint <- contrasts.fit(fit_Condition_Timepoint, Cont.matrix_Condition_Timepoint)


## Differential expression: limma-trend for RNA-Seq data
fit_2_Condition_Timepoint <- eBayes(fit_2_Condition_Timepoint, trend = TRUE)     

## Extract a table of the top-ranked genes from a linear model fit.
tt_Condition_Timepoint <- topTable(fit_2_Condition_Timepoint, number=Inf, adjust="BH") 

# Multiple Testing Across Genes and Contrasts
decideTests(fit_2_Condition_Timepoint, method = "separate", adjust.method = "BH")
ADD REPLY

Login before adding your answer.

Traffic: 538 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6