Question

DESeq2: differential expression analysis from matrix count with gene annotation

0

Entering edit mode

ecg1g15 ▴ 30

@ecg1g15-19970

Last seen 4.2 years ago

I am trying to analyse my transcript data to generate heat maps, PCAs and see the differential expression between samples. I don't necessarily need the coldata, as I already know the conditions for each sample name but if so, it would be location, West, Mid and East.

sample  location
X1473   Mid
X1475   Mid
X1528   Mid
X1584   East
X1585   East
X1586   East
X1678   West
X1679   West
X1680   West
BLANK   None

I have a matrix with read counts prepared from another source "df" such as:

# A tibble: 1,864 x 11
   func                            X1473 X1475 X1528 X1584 X1585 X1586 X1678 X1679 X1680 blank
   <chr>                           <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 1-5-phosphoribosyl-5-5-phosphor…   93    81    36    58    45    26    32    57    65   0
 2 1-5-phosphoribosyl-5-amino-4-im…    11    20     6     7    14     5     4     7    13    0
 3 1-acyl-sn-glycerol-3-phosphate …   96    76    43    50    88    39    42    61    62   1
 4 1-deoxy-D-xylulose-5-phosphate …   192   169    79    95   134    77    71   148   133  1
 5 1-deoxy-D-xylulose-5-phosphate …   557   722   303   700   935   507   275   594   694  2

Following the DESEq2 tutorial https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#indfilt I have tried to use

library("pasilla")
pasCts <- system.file("extdata",
                      "pasilla_gene_counts.tsv",
                      package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
                       "pasilla_sample_annotation.csv",
                       package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]

I am unsure how each file has to look like, as maybe "extdata" is meant to contain the gene ID but not the annotation already? What if I already have the count matrix with the count reads and annotations as shown above? What is the best way to start with the differential expression analysis with my count matrix with annotated genes? Cheers,

deseq2 dataframe pasilla • 1.8k views

ADD COMMENT • link updated 5.8 years ago by Michael Love 43k • written 5.8 years ago by ecg1g15 ▴ 30

score 1 · Answer 1 · 2019-04-21

1

Entering edit mode

Michael Love 43k

@mikelove

Last seen 5 hours ago

United States

Try this tutorial instead:

https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html#starting-from-count-matrices

ADD COMMENT • link 5.8 years ago Michael Love 43k

0

Entering edit mode

Thanks Michael,

ddsMat <- DESeqDataSetFromMatrix(countData = table_samples,
+                                  colData = colData,
+                                  design = ~ location)

brings me this error:

  Error in DESeqDataSetFromMatrix(countData = table_samples, colData = colData,  : 
      ncol(countData) == nrow(colData) is not TRUE

Which is true, because the matrix has an extra row called: “func” with the gene names, I decided to convert that column into a header to the new data matrix would look ike this:

head(table_samples, 3)

# A tibble: 3 x 18
   X1473 X1475 X1528 X1584 X1585 X1586 X1678 X1679 X1680
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1   93    81    36    58    45    26    32    57    65
2   11    20     6     7    14     5     4     7    13
3   76    43    50    88    39    42    61    62
# … with 1 more variable: blank <dbl>

and head(colData, 3)

  sample location
1  X1473     West
2  X1475     West
3  X1528     West

Which brings up: converting counts to integer mode and generates a ddsMat with 0 objects and 0 pointer.

What am I missing? I checked names of the samples and they are fine.

ADD REPLY • link 5.8 years ago ecg1g15 ▴ 30

0

Entering edit mode

Can you show the exact code and the error? It’s hard to guess what steps are occurring.

ADD REPLY • link 5.8 years ago Michael Love 43k

0

Entering edit mode

ddsMat <- DESeqDataSetFromMatrix(countData = table_samples, + colData = colData, + design = ~ location)

Error in DESeqDataSetFromMatrix(countData = table_samples, colData = colData, : ncol(countData) == nrow(colData) is not TRUE In addition: Warning messages: 1: In class(object) <- "environment" : Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object 2: In class(object) <- "environment" : Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object

ADD REPLY • link 5.8 years ago ecg1g15 ▴ 30

0

Entering edit mode

I thought you already worked past that error.

Maybe take a step back, read the docs again and the function help, specifically the input arguments.

ADD REPLY • link 5.8 years ago Michael Love 43k