I am trying to analyse my transcript data to generate heat maps, PCAs and see the differential expression between samples. I don't necessarily need the coldata, as I already know the conditions for each sample name but if so, it would be location, West, Mid and East.
sample location
X1473 Mid
X1475 Mid
X1528 Mid
X1584 East
X1585 East
X1586 East
X1678 West
X1679 West
X1680 West
BLANK None
I have a matrix with read counts prepared from another source "df" such as:
# A tibble: 1,864 x 11
func X1473 X1475 X1528 X1584 X1585 X1586 X1678 X1679 X1680 blank
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1-5-phosphoribosyl-5-5-phosphor… 93 81 36 58 45 26 32 57 65 0
2 1-5-phosphoribosyl-5-amino-4-im… 11 20 6 7 14 5 4 7 13 0
3 1-acyl-sn-glycerol-3-phosphate … 96 76 43 50 88 39 42 61 62 1
4 1-deoxy-D-xylulose-5-phosphate … 192 169 79 95 134 77 71 148 133 1
5 1-deoxy-D-xylulose-5-phosphate … 557 722 303 700 935 507 275 594 694 2
Following the DESEq2 tutorial https://www.bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#indfilt I have tried to use
library("pasilla")
pasCts <- system.file("extdata",
"pasilla_gene_counts.tsv",
package="pasilla", mustWork=TRUE)
pasAnno <- system.file("extdata",
"pasilla_sample_annotation.csv",
package="pasilla", mustWork=TRUE)
cts <- as.matrix(read.csv(pasCts,sep="\t",row.names="gene_id"))
coldata <- read.csv(pasAnno, row.names=1)
coldata <- coldata[,c("condition","type")]
I am unsure how each file has to look like, as maybe "extdata" is meant to contain the gene ID but not the annotation already? What if I already have the count matrix with the count reads and annotations as shown above? What is the best way to start with the differential expression analysis with my count matrix with annotated genes? Cheers,
Thanks Michael,
brings me this error:
Which is true, because the matrix has an extra row called: “func” with the gene names, I decided to convert that column into a header to the new data matrix would look ike this:
head(table_samples, 3)
and
head(colData, 3)
Which brings up: converting counts to integer mode and generates a ddsMat with 0 objects and 0 pointer.
What am I missing? I checked names of the samples and they are fine.
Can you show the exact code and the error? It’s hard to guess what steps are occurring.
Error in DESeqDataSetFromMatrix(countData = table_samples, colData = colData, : ncol(countData) == nrow(colData) is not TRUE In addition: Warning messages: 1: In class(object) <- "environment" : Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object 2: In class(object) <- "environment" : Setting class(x) to "environment" sets attribute to NULL; result will no longer be an S4 object
I thought you already worked past that error.
Maybe take a step back, read the docs again and the function help, specifically the input arguments.