Question

Question about extracting co-efficients for pairwise comparisons in edgeR

0

Entering edit mode

Sabiha ▴ 20

@7f93ecd8

Last seen 22 months ago

United States

Hi,

I am analyzing RNA-Seq dataset using EdgeR package, I have question while extracting co-efficient for pairwise comparisons to detect genes that are differentially expressed between group after performing dispersion estimation, fitting the model (steps below). Just checking if I am I doing it rightly? Alternatively, is there another way compare the contrasts?


#>    Samples Patient Condition_Timepoint
#> 1       S1      P1              mo_6hr
#> 2       S2      P2              mo_6hr
#> 3       S3      P3              mo_6hr
#> 4       S4      P4              mo_6hr
#> 5       S5      P5              mo_6hr
#> 6       S6      P6              mo_6hr
#> 7       S7      P1             Inf_6hr
#> 8       S8      P2             Inf_6hr
#> 9       S9      P3             Inf_6hr
#> 10     S10      P4             Inf_6hr
#> 11     S11      P5             Inf_6hr
#> 12     S12      P6             Inf_6hr
#> 13     S13      P1             mo_24hr
#> 14     S14      P2             mo_24hr
#> 15     S15      P3             mo_24hr
#> 16     S16      P4             mo_24hr
#> 17     S17      P5             mo_24hr
#> 18     S18      P6             mo_24hr
#> 19     S19      P1            Inf_24hr
#> 20     S20      P2            Inf_24hr
#> 21     S21      P3            Inf_24hr
#> 22     S22      P4            Inf_24hr
#> 23     S23      P5            Inf_24hr
#> 24     S24      P6            Inf_24hr
#> 25     S25      P1             mo_48hr
#> 26     S26      P2             mo_48hr
#> 27     S27      P3             mo_48hr
#> 28     S28      P4             mo_48hr
#> 29     S29      P5             mo_48hr
#> 30     S30      P6             mo_48hr
#> 31     S31      P1            Inf_48hr
#> 32     S32      P2            Inf_48hr
#> 33     S33      P3            Inf_48hr
#> 34     S34      P4            Inf_48hr
#> 35     S35      P5            Inf_48hr
#> 36     S36      P6            Inf_48hr

Create model matrix

Patient_ID_v1 <- factor(sample_metadata$Patient)
Condition.Treatment <- factor(sample_metadata$Condition_Timepoint, levels=c("mo_6hr",   "Inf_6hr",  "mo_24hr",  "Inf_24hr", "mo_48hr",  "Inf_48hr"))
design.Condition.Treatment <- model.matrix(~Patient_ID_v1+Condition.Treatment)

Dispersion estimation

y.Treatment.v1 <- estimateDisp(y.Treatment,design.Condition.Treatment, robust=TRUE)

Fit the model

fit.Treatment <- glmQLFit(y.Treatment.v1,design.Condition.Treatment, robust = TRUE)
colnames(fit.Treatment$coefficients)
 [1] "(Intercept)"                 "Patient_ID_v1P2"            
 [3] "Patient_ID_v1P3"             "Patient_ID_v1P4"            
 [5] "Patient_ID_v1P5"             "Patient_ID_v1P6"            
 [7] "Condition.TreatmentInf_6hr"  "Condition.Treatmentmo_24hr" 
 [9] "Condition.TreatmentInf_24hr" "Condition.Treatmentmo_48hr" 
[11] "Condition.TreatmentInf_48hr"

To detect genes that are differentially expressed in Inf6_vs_mo6

qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,1,0,0,0,0))
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")

To detect genes that are differentially expressed in Inf24_vs_mo24

qlf.Inf24_vs_mo24 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,0,-1,1,0,0))
topTags(qlf.Inf24_vs_mo24, n=10, adjust.method = "BH", sort.by = "PValue")

To detect genes that are differentially expressed in Inf48_vs_mo48

qlf.Inf48_vs_mo48 <- glmQLFTest(fit.Treatment, contrast=c(0,0,0,0,0,0,0,0,0,-1,1))
topTags(qlf.Inf48_vs_mo48, n=10, adjust.method = "BH", sort.by = "PValue")

In addition to the above to detect genes that are differentially expressed in Inf6_vs_mo6, can I also make like below?

qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, coef="Condition.TreatmentInf_6hr")
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")

qlf.Inf6_vs_mo6 <- glmQLFTest(fit.Treatment, coef=7)
topTags(qlf.Inf6_vs_mo6, n=10, adjust.method = "BH", sort.by = "PValue")

Best Regards,

Sabiha

limma fit design RNASeq edgeR • 918 views

ADD COMMENT • link 22 months ago Sabiha ▴ 20

score 2 · Accepted Answer · 2024-03-14

2

Entering edit mode

Gordon Smyth 53k

@gordon-smyth

Last seen 2 hours ago

WEHI, Melbourne, Australia

It all looks correct. Yes, the three ways you have used to specify the Inf6 vs mo6 comparison are all equivalent.

ADD COMMENT • link 22 months ago Gordon Smyth 53k

0

Entering edit mode

Gordon Smyth thank you very much. "Condition.Treatmentmo_6hr" is missing in the number of coefficients, so I was bit concerned if I end up picking wrong one because pairwise coefficients were available for Inf24_vs_mo24 and Inf48_vs_mo48 comparisons but missing for Inf6_vs_mo6.

Another question about pairwise comparisons, simply by exporting logcpm values from edgeR, and using it in limma is also a feasible and flexible approach. In the limma-trend approach, the counts are converted to logCPM values using edgeR's cpm function: Does the below makes sense?

As per limma manual: In the limma approach to RNA-seq, read counts are converted to log2-counts-per-million [logCPM] and the mean-variance relationship is modelled either with precision weights or with an empirical Bayes prior trend. The precision weights approach is called "voom" and the prior trend approach is called "limma-trend".

y.Treatment <- calcNormFactors(y.Treatment, method = "TMM")
cpm_with_log2 = cpm(y.Treatment, prior.count=1, log=TRUE)


library(limma)
Condition.Treatment <- factor(sample_metadata$Condition_Timepoint, levels=c("mo_6hr",   "Inf_6hr",  "mo_24hr",  "Inf_24hr", "mo_48hr",  "Inf_48hr"))
design_Condition_Timepoint <- model.matrix(~0+Condition_Timepoint)

## Then we estimate the correlation between measurements made on the same subject:
corfit <- duplicateCorrelation(cpm_with_log2, design_Condition_Timepoint, block= sample_metadata$Patient) 

## Then this inter-subject correlation is input into the linear model fit:
fit_Condition_Timepoint <- lmFit(cpm_with_log2, design_Condition_Timepoint, block=sample_metadata$Donor, correlation=corfit$consensus)  

## Now we can make any comparisons between the experimental conditions in the usual way:

Cont.matrix_Condition_Timepoint <- makeContrasts(
  "Inf_6hr_vs_mo_6hr" = Inf_6hr - mo_6hr,
  "Inf_24hr_vs_mo_24hr" = Inf_24hr - mo_24hr,
  "Inf_48hr_vs_mo_48hr" = Inf_48hr - mo_48hr,
  levels=design_Condition_Timepoint)

## Then compute these contrasts and moderated t-tests:
fit_2_Condition_Timepoint <- contrasts.fit(fit_Condition_Timepoint, Cont.matrix_Condition_Timepoint)


## Differential expression: limma-trend for RNA-Seq data
fit_2_Condition_Timepoint <- eBayes(fit_2_Condition_Timepoint, trend = TRUE)     

## Extract a table of the top-ranked genes from a linear model fit.
tt_Condition_Timepoint <- topTable(fit_2_Condition_Timepoint, number=Inf, adjust="BH") 

# Multiple Testing Across Genes and Contrasts
decideTests(fit_2_Condition_Timepoint, method = "separate", adjust.method = "BH")

ADD REPLY • link 22 months ago Sabiha ▴ 20