Comparing gene expression across three tissue types to identify any difference.
1
0
Entering edit mode
@7aa55f9e
Last seen 7 weeks ago
United Kingdom

Good afternoon,

I have been struggling with this for days now. I'm afraid I don't have any failing/working code to post. I apologise if the formatting or style of this post is a little off - it is my first time posting here.

Background: I need to compare genes across 3 tissue types to identify whether there is any difference in expression.

I have created my DSEQ object in R from :

  1. A gene:subject matrix - with each subject providing expression data for each gene across each of three different tissues - lets say, blood, gut, kidney. Prescreened to remove genes with little or no expression

                           Subj1.B Blood ¦ subj1.G Gut ¦ Subj1.K Kidney ¦ Subj2.B Blood ¦ Subj2.G Gut ¦ Subj2.K Kidney 
    
              Gene1          16.3              23.5           etc.                  etc.               etc.                etc.  
              Gene2          12.9              6.89           etc.                  etc.               etc.                etc.  
              Gene3          1.8               etc.           etc.                  etc.               etc.                etc.  
    

and:

  1. A biodata dataframe with each subject represented 3 times for each tissue type

                        Age              Sex               Tissue_Type              
            Subj1.B    x                  z                      Blood
            Subj1.G    x                  z                      Gut 
            Subj1.K    x                  z                      Kidney 
            Subj2.B    x                  z                      Blood
            Subj2.G    x                  z                      Gut 
            Subj2.K    x                  z                      Kidney 
            Subj3.B    x                  z                      Blood
            Subj3.G    x                  z                      Gut 
            Subj3.K    x                  z                      Kidney 
    

I created the DSEQ object and carried out the DSEQ analysis.

All works as it should, so far. I have carried out the VST transformation and plotPCA.

So, to my current issue:

I need to identify which genes have the same or statistically similar expression levels across all 3 tissue types, and then exclude them from further study. I have used

    contrast <- c("Tissue-Type", "Blood", "Gut")

but that only compares two out of the three, and even with two I'm struggling to see the metric I should use to establish no significant difference in expression.

I wonder if I should even be moving back to a single tissue dataframes and doing some plain comparative inferential statistics here - but my instincts tell me I'd missing out on something if I went down that route.

I'd really appreciate it someone could shine a light on away forward here for me.

Thanks,

JM

DESeq2 • 376 views
ADD COMMENT
1
Entering edit mode
@james-w-macdonald-5106
Last seen 16 hours ago
United States

You can do the inverse of what you are saying. Removing the genes that don't change is the inverse of keeping the genes that appear to change in at least one tissue. The caveat being that you might not have the power to detect small changes, but let's set that aside.

If you do an LRT test comparing a model that includes tissue and a model that does not include tissue, that will give you the genes for which you have evidence that the expression is different in any tissue (because the full model fits the data better than the reduced model). You can then carry the significant genes forward for 'further study'.

0
Entering edit mode

Thanks, JW.

I had read about LRT.

My original DESeqDataSet had Age, Sex and Tissue Type, so (bear with me) would I need to re-do the DSeq along these lines - just for Tissue-Type? :

 NEW_GENES_DESeqDataSet <- DESeqDataSetFromMatrix(countData = round(MY_MATRIX), colData = BioData_DF,design =~Tissue_Type   #only)   

 FOR_LRT_TEST <- DESeq(GENES_DATASET , test="LRT", reduced = ~ Tissue_Type)

then I can interrogate that object using degPatterns ().

Or would I just use my existing DSeq object with :

 FOR_LRT_TEST <- DESeq(GENES_DATASET , test="LRT", reduced = ~ Tissue_Type)

Then move onto the next stage?

I appreciate the support, JA

ADD REPLY
0
Entering edit mode

Please re-read my original post. The reduced model should not contain tissue type.

ADD REPLY
0
Entering edit mode

Okay - think I've got it. I'll report back.

Here is the report back:

After some trial and error, here is the code I used. It produced an answer but I'm not sure if a) it is the answer I need, or b) if it is, how to remove those genes where expression is equal.

#original DESeq object
==================

ALL_GENES <- DESeq(ALL_GENES_SEQ_OBJECT, test="LRT", reduced = ~SEX+AGE+Death+Tissue_Type)

# Likelihood ratio test on dataset with tissue missing
=========================================

LRT_ALL_GENES_REDUCED <- DESeq(ALL_GENES_SEQ_OBJECT, test="LRT", reduced = ~SEX+AGE+Death)

Results

log2 fold change (MLE): Tissue Type Muscle vs Blood

LRT p-value: '~ SEX + AGE + Death+ Tissue_Type' vs '~ SEX + AGE + Death' 

DataFrame with 14955 rows and 6 columns
                  baseMean log2FoldChange     lfcSE       stat      pvalue       padj
                 <numeric>      <numeric> <numeric>  <numeric>   <numeric>  <numeric>
ENSG00000000003.14    33.81022       1.865390 0.0537597    9890.36           0          0
ENSG00000000419.12   115.26431      -0.222192 0.0332830    4855.64           0          0
ENSG00000000457.13    18.71515      -4.196328 0.0790396    8423.74           0          0
ENSG00000000460.16     4.40886      -2.796579 0.0845453    1822.62           0          0
ENSG00000000938.12  1684.10848      -3.209652 0.0855534   32021.07           0          0
...                        ...            ...       ...        ...         ...        ...
ENSG00000284237.1  3.55911e-01      -0.189193 0.1409802    439.181 4.29682e-96 4.4781e-96
ENSG00000284308.1  1.74974e+01       2.552311 0.0519396   6011.751 0.00000e+00 0.0000e+00
ENSG00000284413.1  1.46580e+01       3.457797 0.0660032  16560.910 0.00000e+00 0.0000e+00
ENSG00000284484.1  5.86535e+04      18.337205 0.0865800 137508.151 0.00000e+00 0.0000e+00
ENSG00000284526.1  9.62264e+00       4.937876 0.0994412   2670.368 0.00000e+00 0.0000e+00

#I then tried to plot the results following the example here [enter link description here][1]- but the results made little sense to me :-(

# Subset the LRT results to return genes with padj < 0.05
LRT_05_CUTOFF <- LRT_RESULTS %>%
data.frame() %>%
rownames_to_column(var="Tissue_Type") %>% 
as_tibble() %>% 
filter(padj < 0.05)


# Get sig gene lists
LIST_LRT_O5_SIG_genes <- data.frame(LRT_05_CUTOFF %>% 
pull (Tissue_Type))
length(LIST_LRT_O5_SIG_genes)


# Subset results for faster cluster finding
CLUSTER_LRT_05_CUTOFF <- LRT_05_CUTOFF %>%
arrange(padj) %>%
head (n=1000)

# Obtain rlog values for those significant genes - my data too large for rlog, so used VST instead. 

#Normalise

ALLVstTrans <- varianceStabilizingTransformation (LRT_ALL_GENES_REDUCED, blind = TRUE, fitType = "parametric")

rld_mat <- assay(ALLVstTrans)
cluster_rlog_LRT_05_CUTOFF <- rld_mat[CLUSTER_LRT_05_CUTOFF$Tissue_Type, ]


# Use the `degPatterns` function from the 'DEGreport' package to show gene clusters across sample groups

clusters <- degPatterns (cluster_rlog_LRT_05_CUTOFF, metadata = ALL_SUBJECTS, time = "Tissue_Type", col=NULL)

Results of 'clusters'

The above was not what I was expecting, and I can't work out what it's telling me.

At this point it's 12.20 a.m. and my brain gave up. So I went to bed to get some sleep, and continued to think about it for another two hours.

I think I'm further forward? but not sure where I have arrived.

ADD REPLY

Login before adding your answer.

Traffic: 622 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6