Sample info

Question

Question about RNAseq analysis in EdgeR to identify common and donor specific differentially expressed genes

0

Entering edit mode

mohammedtoufiq91 ▴ 10

@mohammedtoufiq91-17679

Last seen 10 weeks ago

United States

Hi,

This question is about RNAseq analysis in EdgeR to identify common and specific differentially expressed genes. I have 3 different Donor's in-vitro cultured tissue with two different infection status, one with infected virus (high dose 6hr) and another with un-infected (baseline 0hr). This was sequenced using RNAseq, then aligned and quantified. I now have a gene counts file ready to import into EdgeR RNAseq analysis pipeline. Using this data, I am interested in what are the common vs specific DEGs in response to virus per donor?

Identifying common DEGs in response to virus between infected vs. un-infected samples from 3 different donors?
Following, this perform donor specific analysis or response to virus. To pull out specific response per donor?

Sample info

#>   Samples Donor Time     Status
#> 1      S1    D1  0hr Uninfected
#> 2      S2    D1  6hr   Infected
#> 3      S3    D2  0hr Uninfected
#> 4      S4    D2  6hr   Infected
#> 5      S5    D3  0hr Uninfected
#> 6      S6    D3  6hr   Infected

1. Identifying common DEGs in response to virus between infected vs. un-infected samples from 3 different donors?


library(edgeR)
group.Status <-  factor(Sample_info$Status)
y <- DGEList(counts = gene_counts, group = group.Status, remove.zeros = TRUE)
keep <- filterByExpr(y)
y <- y[keep, , keep.lib.sizes=FALSE]
y <- calcNormFactors(y, method = "TMM")
cpm_with_log2 = cpm(y, prior.count=2, log=TRUE)


## Create design file

Donor_ID <- factor(Sample_info$Donor)
Status <- factor(Sample_info$Status, levels=c("Uninfected", "Infected"))
design <- model.matrix(~Donor_ID+Status)


## Dispersion estimation
y <- estimateDisp(y,design, robust=TRUE) 


# Fit the model
fit <- glmQLFit(y,design, robust = TRUE)


## To detect genes that are differentially expressed in Infected vs Uninfected:
qlf.Infected_vs_Uninfected <- glmQLFTest(fit, coef=4)
topTags(qlf.Infected_vs_Uninfected, n=10, adjust.method = "BH", sort.by = "PValue", p.value = 1)

2. Following, this perform donor specific analysis or response to virus. To pull out specific response per donor?

To perform donor specific analysis, is there a way to extract DEGs specific to donor from the above glmQLFTest?

OR, probably, just change the design formula to: design <- model.matrix(~0 + Donor_ID + Status + Donor_ID:Status)

Best Regards,

Toufiq

R edgeR RNASeq • 2.0k views

ADD COMMENT • link 24 months ago • updated 23 months ago mohammedtoufiq91 ▴ 10

0

Entering edit mode

swbarnes2 ★ 1.4k

@swbarnes2-14086

Last seen 6 hours ago

San Diego

You have 6 samples total?

You don't have enough power to do anything complex. Compare infected to uninfected. That's it.

ADD COMMENT • link 24 months ago swbarnes2 ★ 1.4k

0

Entering edit mode

swbarnes2 Thank you for the response. Yes, I have 6 samples in total. In addition to the Infected vs. Uninfected comparison, I also wanted to try specific donor's response. I see a method in edgeR 2.12 What to do if you have no replicates. Does this make sense to use?

ADD REPLY • link 24 months ago mohammedtoufiq91 ▴ 10

0

Entering edit mode

I understand what you want, but you have the bare minimum number of samples to do the simple comparison. I don't think you can do more than that.

ADD REPLY • link 23 months ago swbarnes2 ★ 1.4k

score 2 · Accepted Answer · 2023-07-23

Trying to determine donor-specific DE genes is a non-standard thing to do in terms of statistical testing, but you can use a method similar to that we used for the Oral squamous cell carcinoma data in our paper McCarthy et al (2012).

Starting with the code you have already have, you fit a new model with donor-specific effects:

design.donorspecific <- model.matrix(~Donor_ID + Donor_ID:Status)
fit.donorspecific <- glmFit(y, design.donorspecific, dispersion=y$trended.dispersion)

The donor-specific model has no residual df, but the code will resuse the trended dispersions you estimated previously from the non-donor specific model to achieve a approximate test. Now you can test for DE genes for each donor individually. To get DE genes for donor D1:

lrt.D1 <- glmLRT(fit.donorspecific, coef="Donor_IDD1:StatusUninfected")
topTags(lrt.D1)

Note that you must use glmFit and glmLRT rather than glmQLFit and glmQLFTest for this approximate method to work.

Despite the small sample numbers, the donor-specific test should be conservative rather than liberal on average, because it uses the dispersion estimated from the donor by status interaction in place of a donor-specific repeated measures dispersion, which would almost certainly be smaller.

This is an application of Method 3 from Section 2.12 of the edgeR User's Guide.