estimateSizeFactors before or after removing non-coding genes and filtering for low counts
1
0
Entering edit mode
Tash. • 0
@tash-17343
Last seen 27 days ago
United Kingdom

Hi there,

My initial strategy was to retain the protein-coding genes and filter for low counts before creating the dds object.

        GD = read.delim("mart_export (1).txt", header = T, sep = "\t", stringsAsFactors = F)
counts = counts %>% inner_join(GD[2]%>%unique, by=c("Gene" = "symbol"))
counts_protein = counts%>% subset(rowSums(counts)>10)
dds<-DESeqDataSetFromMatrix(counts_protein, colData, formula(~ sample_type))
dds<-estimateSizeFactors(dds)


I'm not entirely sure if I should be estimating the size factors before or after removing non-coding genes and filtering for low counts? There are a lot of genes with a rowSums of 0 (majority of these are the non-coding) and should these be included in the estimation; i.e. I'm wondering if this can affect the estimation?

Many thanks.

DESeq2 RNASeq • 153 views
2
Entering edit mode
@mikelove
Last seen 6 hours ago
United States

You can filter first before size factors, but not that row sums of 0 are not included anyway in size factor estimation.