Generate Heatmap from Gene Names after DESEQ2
1
0
Entering edit mode
uhlkatie • 0
@uhlkatie-20770
Last seen 10 weeks ago
United States

Enter the body of text here

Code should be placed in three backticks as shown below


# include your problematic code here with any corresponding output
# please also include the results of running the following in an R session

sessionInfo( )


Hello,

I have a tentative grasp on DESEQ2 I am trying to generate a heatmap (and other DEG visualization tools) using my output from DESEQ2. I followed the tutorial (http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#why-un-normalized-counts) starting with salmon files and then using tximeta to create a summarized experiment. I performed DESEQ2 on my summarized experiment:

se <- tximeta(coldata)
library(DESeq2)
ddsTxi <- DESeqDataSet(se, design = ~ Condition)
ddsTxi


The object ddsTxi looks something like this:

Sample1     Sample2    Sample3    Sample4     Sample5
ENST00000456328.2        1.850       1.333      3.845      0.000       6.181
ENST00000450305.2        0.000       0.000      0.000      0.000       0.000
ENST00000488147.1       84.946     140.966    117.379    142.208      71.022


And then DESEQ2:

dds <- DESeq(dds)
res <- results(dds)
res
mcols(res, use.names = TRUE)


Which looks something like this:

                     baseMean log2FoldChange     lfcSE        stat    pvalue
<numeric>      <numeric> <numeric>   <numeric> <numeric>
ENST00000456328.2   13.750573     -0.2570229  1.268077 -0.20268715  0.839380
ENST00000488147.1  119.249079      0.2125775  0.305591  0.69562687  0.486663
ENST00000473358.1    0.287952     -0.0662034  7.678081 -0.00862239  0.993120


And I know that from there I can perform comparisons between any of my variables/conditions. My first issue is that I would really like the transcript IDs to be replaced by the Gene names. I've used Biomart before to convert the transcript ID to ensmbl gene names in the past. But when I do this, I end up with multiple transcripts for the same gene. I will look at the fold changes for Gene A, only to find that there are three different entries. Is there any way to generate results tables that take this into account and only have one entry per gene? I need to generate heatmaps for the different experimental comparisons, but for that I would also need a gene counts table with one entry per gene. I will take any advice on how to do the conversion, download a gene counts table, or any good packages for generating heatmaps with DESEQ2 data.

Thank you!

0
Entering edit mode
@james-w-macdonald-5106
Last seen 1 day ago
United States

Have you read the tximeta vignette, particularly the part about summarizing to gene level and changing the identifiers?

0
Entering edit mode

I just looked through the vignette and that helped! I was able to follow it through to create the "gse" object to use as input for DESEQ2. My question now is if there is a way to save that object, (which I'm presuming are the gene-level counts for each sample), as a data frame, matrix, or anything I can use outside of R? I tried the write.csv command and got an error message saying, "Error in as.vector(x) : no method for coercing this S4 class to a vector".

0
Entering edit mode

The object you wish to save acts like a data.frame in order to shield end users from the complications of trying to consistently manipulate several different things at once. The 'dds' (gse?) object is actually comprised of multiple objects; one or more matrices of the underlying count (and possible modifications of the counts) data, a DataFrame that describes the samples, and a GRanges object that describes the rows. The show method gives you a peek at all the data in an object.

> fake.o <- makeExampleDESeqDataSet()
> fake.o
class: DESeqDataSet
dim: 1000 12
assays(1): counts
rownames(1000): gene1 gene2 ... gene999 gene1000
rowData names(3): trueIntercept trueBeta trueDisp
colnames(12): sample1 sample2 ... sample11 sample12
colData names(1): condition


And you can inspect each underlying object using the relevant accessor methods.

> colData(fake.o)
DataFrame with 12 rows and 1 column
condition
<factor>
sample1          A
sample2          A
sample3          A
sample4          A
sample5          A
...            ...
sample8          B
sample9          B
sample10         B
sample11         B
sample12         B
> rowData(fake.o)
DataFrame with 1000 rows and 3 columns
trueIntercept  trueBeta  trueDisp
<numeric> <numeric> <numeric>
gene1        0.0455954         0  3.975559
gene2        2.0951366         0  1.036184
gene3        2.3462632         0  0.886619
gene4        3.8920042         0  0.369433
gene5        6.0617779         0  0.159880
...                ...       ...       ...
gene996        2.51762         0  0.798522
gene997        5.45743         0  0.191035
gene998        6.19437         0  0.154622
gene999        1.26165         0  1.768266
gene1000       2.49833         0  0.807924
> rowRanges(fake.o)
GRanges object with 1000 ranges and 3 metadata columns:
seqnames       ranges strand | trueIntercept  trueBeta  trueDisp
<Rle>    <IRanges>  <Rle> |     <numeric> <numeric> <numeric>
gene1        1        1-100      * |     0.0455954         0  3.975559
gene2        1      101-200      * |     2.0951366         0  1.036184
gene3        1      201-300      * |     2.3462632         0  0.886619
gene4        1      301-400      * |     3.8920042         0  0.369433
gene5        1      401-500      * |     6.0617779         0  0.159880
...      ...          ...    ... .           ...       ...       ...
gene996        1  99501-99600      * |       2.51762         0  0.798522
gene997        1  99601-99700      * |       5.45743         0  0.191035
gene998        1  99701-99800      * |       6.19437         0  0.154622
gene999        1  99801-99900      * |       1.26165         0  1.768266
gene1000        1 99901-100000      * |       2.49833         0  0.807924
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
> dim(assay(fake.o))
[1] 1000   12

sample1 sample2 sample3 sample4 sample5 sample6 sample7 sample8 sample9
gene1       0       2       0       0       0       0       0       0       1
gene2       5       2      12       1      10       0       2       3       4
gene3       2      18       5       7       0       3       3       2       1
gene4       2       5      10      17       9       8       8      14      27
gene5      39      30      40      68      55     104      32      19      51
gene6       5       7       0       1       1       3       0       4       0
sample10 sample11 sample12
gene1        0        2        0
gene2        1        5        0
gene3       11        1        4
gene4       18       29        5
gene5       59       61      109
gene6        0        0       14
>


And after running DESeq on that object, there are even more things in the resulting DESeqDataSet object. So 'writing to csv' isn't such a simple thing, and I would argue is unnecessary. To what end? All these objects are intended to be very complex, but simple to use, and there are any number of accessors that you can use to make plots or whatever. I can't figure out what you could do outside of R that wouldn't be easier to do from within R.

But anyway, you can extract anything you like from the object and then use write.csv or write.table directly.