Help creating a paired sample design matrix in edgeR
Entering edit mode
sfidemw • 0
Last seen 2.3 years ago

We are performing RNA-seq analysis in R using Rsubread and edgeR. We would like to be able to create a design matrix that allows our samples to be paired together.

We have been following the edgeR quasilikelihood pipeline (Link: Chen, Lu and Smith, 2020. We modified this pipeline to use glmLRT as opposed to the glmQLRT and then attempted to include multiple parameters in our design matrix, both our treatment condition as well as the pair identifier.

Our experimental design included one phytoplankton strain that was diluted to extinction for single cell isolation (Luria-Delbruck protocol) and from that, each of six clonal isolates were used to seed an individual paired treatment sample (400 ppm/800 ppm CO2).

In doing our analysis, we want to be able to compare between present (400ppm) and future(800ppm) conditions. However, we also want to be able to account for the within clonal variability that we may see within each of the paired treatment samples. Our y object has a group column (see below) where we labeled each sample with its treatment and pair ID. Our pairs should be Present.45--Future.45, etc. We understand that the difference in name may be causing an issue but are not sure how to troubleshoot this.

Additionally, our code is working but the design matrix is not structured in a way to allow us to make the desired comparisons. Looking at the output of our design matrix code, the only comparisons we can make are between the columns (TxFuture, TxPresent, Pair46, etc) We are unable to make a comparison between the two samples that are found in Pair 46, or Pair 47. We also are unsure of how to incorporate the within clonal variability in the overall Present-Future comparison.

As a final note, we are not posting any sessionInfo() information because we aren't technically getting any type of error message.


#load library

#call library--this is an organismal database that we curated ourselves

#modifying our DGEList object to include info from organismal database
y$genes$symbol <- mapIds(, rownames(y),
                         keytype = "GID", column = "GENENAME")
y$genes$product <- mapIds(, rownames(y),
                          keytype = "GID", column = "GENEPRODUCT")

#look at the samples included in our y object.

                                                                                         group lib.size norm.factors
SS0029.S66.L008.R1.001.fastq.gz.subread.BAM   Future.45  2723395    0.7895361
SS0030.S67.L008.R1.001.fastq.gz.subread.BAM   Future.50  1349356    0.9642628
SS0031.S68.L008.R1.001.fastq.gz.subread.BAM   Future.46  1931205    0.8817554
SS0032.S69.L008.R1b.001.fastq.gz.subread.BAM  Future.47  2371530    0.9668457
SS0033.S70.L008.R1.001.fastq.gz.subread.BAM   Future.48  1662338    1.0713176
SS0034.S71.L008.R1.001.fastq.gz.subread.BAM  Present.48  2485131    0.9618421
SS0036.S72.L008.R1.001.fastq.gz.subread.BAM  Present.49  2397884    1.1038003
SS0037.S73.L008.R1.001.fastq.gz.subread.BAM  Present.45  3102922    1.0451120
SS0038.S74.L008.R1.001.fastq.gz.subread.BAM  Present.50  1853192    1.1969441
SS0039.S75.L008.R1.001.fastq.gz.subread.BAM   Future.49   689415    1.0852899
SS0040.S20.L002.R1.001.fastq.gz.subread.BAM  Present.47  1007293    1.1487667
SS035.S19.L002.R1.001.fastq.gz.subread.BAM   Present.46   934291    0.868561

#now we are creating objects of both tx and paired sample ID
Tx <- factor(targets$Tx)
Pair <- factor(targets$Pair)

#create our design matrix 
design <- model.matrix(~0+Tx+Pair)
rownames(design) <- levels(group)

      TxFuture TxPresent Pair46 Pair47 Pair48 Pair49 Pair50
 [1,]        1         0      0      0      0      0      0
 [2,]        1         0      0      0      0      0      1
 [3,]        1         0      1      0      0      0      0
 [4,]        1         0      0      1      0      0      0
 [5,]        1         0      0      0      1      0      0
 [6,]        0         1      0      0      1      0      0
 [7,]        0         1      0      0      0      1      0
 [8,]        0         1      0      0      0      0      0
 [9,]        0         1      0      0      0      0      1
[10,]        1         0      0      0      0      1      0
[11,]        0         1      0      1      0      0      0
[12,]        0         1      1      0      0      0      0
[1] 1 1 2 2 2 2 2
[1] "contr.treatment"

[1] "contr.treatment"
edgeR pairedsamples designmatrix • 554 views
Entering edit mode
Yunshun Chen ▴ 790
Last seen 4 weeks ago

We are unable to make a comparison between the two samples that are found in Pair 46, or Pair 47.

You are using an additive model for your design matrix, which assumes the Present-Future differences are the same across all the pairs. You can't compare the two individual samples in each pair separately.

We also are unsure of how to incorporate the within clonal variability in the overall Present-Future comparison.

Do you mean comparing Present with Future in each Pair? To do that, you need to fit an interactive model, or something similar to the F1000Research paper on edgeR quasi-likelihood pipeline. But the problem is your data doesn't have enough replicate samples (it only has 1 Present and 1 Future sample in Pair 46, etc). You need more samples in order to do that.

Entering edit mode


The interactive model will most likely be the way we have to approach the issue. Our entire sample set includes 6 replicates grown at 400 ppm CO2 and 6 replicates grown at 800 ppm CO2. Each of those 6 replicates were started from a single clonal culture isolated from a parent culture. We were under the impression that you could account for within clone variability by comparing the 400 and 800 sample of each clone. A "pairwise" comparison so to speak.

Thank you for the feedback! We appreciate the help.


Login before adding your answer.

Traffic: 140 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6