DESeq extract result of lrt for a subset of the samples
1
0
Entering edit mode
md27 • 0
@md27-12921
Last seen 4.4 years ago

Hi,

I'm quite new to R and RNA-seq data analysis. I have been looking through some posts here but did not find an answer to what I was looking for, hope someone could help me here:)

I'm using DESeq2 to analyse RNA-seq data set with 96 samples: 16 different species, 2 types of tissue each (M, L), and 3 biological replicates of each tissue of each species. The gene expression level in one tissue type can be completely different from the other, and they don't have to correlate with each other.  I'm looking to find differentially expressed genes between species, in either of the tissue types.

I have the counts file from Kallisto and imported the data with tximport, then did an lrt test with all the samples together to generate PCA plot. An outline of the code is as follows:

txi.tsv <- tximport(files, type = "kallisto", tx2gene = tx2gene)

dds.int <- DESeqDataSetFromTximport(txi.tsv,
SampleTable,
design = ~ tissue + species + tissue:species)

dds.int <- DESeq(dds.int, test="LRT", reduced = ~tissue + species)

vsd.int <- varianceStabilizingTransformation(dds.int, blind=FALSE)

plotPCA(vsd.int, intgroup=c("species", "tissue"), returnData=TRUE)

One questions is: is this the right way to do it? As I don't expect any correlation of gene expression in one tissue type to the other, but want to see the difference among species in either of the tissue type. In the PCA plot I do see that the tissue cluster together before species, as expected. And I can generate PCA plot for only one type of tissue like:

vsd.m <- vsd.int[ , vsd.int\$tissue == "M"]
plotPCA(vsd.m)

If the analyse above is right, now the main question is: how can extract lrt result for all of the species of one type of tissue only? Before looking into the pairwise comparisons, I want to have a list of all the gene that is differentially expressed in any of the species, with an lrt test p-value. Can I simply extract result from the dds.int above, or do I need to separate the dataset and run each of the tissue type separately?

I hope I described my questions clearly enough and thank you very much for your help!

deseq2 rnaseq lrt • 1.6k views
0
Entering edit mode
@mikelove
Last seen 38 minutes ago
United States

The easiest design would be to combine tissue and species into a new factor variable called group. Then you could compare two species across the same tissue with:

results(dds, contrast=c("group", "speciesXtissueM", "speciesYtissueM"))

See the first code chunk here:

https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#interactions

0
Entering edit mode

Hi, Thank you!

I did see this group option and I thought it should do what I wanted. But my problem again is not to compare two species, but all 16 species at once, or at least a subset of them, say 11 species. What would be the contrast or name to call for this?

And Should I use this group option instead of

design = ~ tissue + species + tissue:species

for lrt from the beginning? Will the code something like

design = ~ tissue + species + group

reduced = ~ tissue + species?

Sorry I'm still quite confused, please bear with me...

0
Entering edit mode

Would you say your goal is to find genes in which there is differential expression across species (so at least one species different than the others)? And to have one test result table like this for one tissue, and another result table for the other tissue?

0
Entering edit mode

Yes I thought so as a first step, it's not the final goal but I thought this should be easily done (maybe I'm wrong about it).

Maybe I could do it the other way round: could I just do the pairwise comparison as you suggested, and get a list of genes differentially expressed between each combination of two species, and the collection of all the genes at least in one of the pairwise comparison would be similar to using lrt for all species together? I imagine these will not be exactly the same due to the manually set cut-off pval, but should be similar?

0
Entering edit mode

You can construct model matrices to make these into likelihood ratio tests. If you do:

mm <- model.matrix(~ tissue + tissue:species, colData(dds))

You will have a model matrix with differences across tissue, and then also, for each tissue, potential species-specific differences. This matrix will be passed to the 'full' argument of DESeq(), as it specifies the full model.

Say for tissue M, if you remove all the columns in mm which are named "tissueM.speciesX" for all the different species X, and then you can use that model matrix for the 'reduced' argument to DESeq(), and specify test="LRT". This will be a test for any species differences in tissue M. The null hypothesis for this test is that any species differences, if they exist, are in tissue L.

0
Entering edit mode

Thank you so much! I didn't know anything like this before. Will try it out!

Thank you for all your contributions to the community Michael!