Workflow to Analyse DNA Sequence specify gene expression in a fungus under stress
1
0
Entering edit mode
@b39b3713
Last seen 4 months ago
United Kingdom

My project's goal is to understand how DNA sequence specifies gene expression changes in a fungus under stress. There are 2 datasets, count data contains rna_count and metadata contains data information. The metadata consists of temperature (30 and 37), media (YPD, egta, cfw, cfw-egta), and strain (KN99 and flc1). The objective explore the data, find genes that have different expression across relevant conditions, and find sequence motifs that are associated with genes that have different expression across design conditions. My question:

  1. The rna_count data is un-normalised. When doing EDA, should I normalise the data or not? Because if we use original data, we can't compare among samples and also among genes, right?
  2. If we have to normalise data, there are many methods, such as log transformation, TPM, DESeq2, EdgeR and others. What is the justification for choosing the model? Is there a metric evaluation to justify the best method in my case?
  3. If we want to include variance of replication, can we DESeq2 or another package?
    1. If we use DESeq2, we can use many experimental designs. How to decide the best design to use, and what is the metric evaluation?
    2. How to find sequence motif
    3. Please, provide the workflow from beginning to end in my case, because I am not a biologist and I have never used another biology package before.

Thanks

```

rnaseqGene • 833 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 4 days ago
United States

Check the workflow which addresses your Q1:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#exploratory-analysis-and-visualization

The justification for our VST is in the paper and in the workflow.

How to decide the best design to use

The choice of design is really motivated by the biology and technical factors. Which factors affect the expression. Those should be included. In some cases the experimental design means certain factors cannot be estimated, e.g. if condition is confounded with batch. But aside from confounded designs, you typically include factors that are known to affect the measurements.

ADD COMMENT

Login before adding your answer.

Traffic: 819 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6