Entering edit mode
Hi there,
My initial strategy was to retain the protein-coding genes and filter for low counts before creating the dds object.
GD = read.delim("mart_export (1).txt", header = T, sep = "\t", stringsAsFactors = F)
counts = counts %>% inner_join(GD[2]%>%unique, by=c("Gene" = "symbol"))
counts_protein = counts%>% subset(rowSums(counts)>10)
dds<-DESeqDataSetFromMatrix(counts_protein, colData, formula(~ sample_type))
dds<-estimateSizeFactors(dds)
I'm not entirely sure if I should be estimating the size factors before or after removing non-coding genes and filtering for low counts? There are a lot of genes with a rowSums of 0 (majority of these are the non-coding) and should these be included in the estimation; i.e. I'm wondering if this can affect the estimation?
Many thanks.