Hello sir/madam,
I am working on LUAD data obtained from FireBrowse. Its a RNSseqV2 RSEM normalized data of gene expression
(file name: LUAD.rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.data, renamed this file to
LUAD_RNA_seq)
I want to find Differential gene expression between Stages of tumor samples, i.e. in between stage I and Stage IV samples. but i'm not getting any differentially expressed genes.
My questions are,
1) How to select samples specific to stage of tumor? referring to the code below i could extract samples specific to stage I and stage IV but i'm not sure if that is the right method to do so.
2) referring to my R code what is the wrong step i'm following?
3) If my complete analysis to find the DEG is wrong, please tell me what steps I'm suppose to follow or the R packages for the FireBrowse data analysis? or any tutorial which i can follow to get the differentially expression genes.
(if there is no wrong in the analysis please let me know that too.)
Below is the R code i have used to find out the DEG,
rna <- read.table("RNA/LUAD_RNA_seq.txt",nrows = 20533, header = T, row.names = 1, sep = "\t") rem <- function(x){ x <- as.matrix(x) x<- t(apply(x,1,as.numeric)) r <- as.numeric(apply(x,1,function(i)sum(i == 0))) remove <- which(r > dim(x)[2]*0.5) return(remove) } remove <- rem(rna) rna <- rna[-remove,] dim(rna) x <- t(apply(rna,1,as.numeric)) dim(x) #removing probes without symbol row.has.na <- apply(x, 1, function(x){any(is.na(x))}) sum(row.has.na) x <- x[!row.has.na,] dim(x) # Filter lowly expressed genes expr <- x[rowSums(x)>0,] # Remove genes with 0 RPKM in all samples lowexp <- expr <= 0.1 nsamples <- ncol(expr) nsamples lowexpr <- rowSums(lowexp) > (0.95 * nsamples) log2expr <- log2(expr[!lowexpr,]+1) head(log2expr) colnames(rna) colnames(log2expr)<- colnames(rna) head(log2expr) colnames(log2expr) <- gsub('\\.','-',substr(colnames(log2expr),1,28)) #read files of tumor stage library(xlsx) Tumor_stage <- read.xlsx(file = "Clinical_merged_picked/Tumor_stage_data.xlsx", sheetIndex = 1) colnames(Tumor_stage) Tumor_stage$Stage_I_Barcode <- toupper(Tumor_stage$Stage_I_Barcode) Tumor_stage$Stage_IV_Barcode <- toupper(Tumor_stage$Stage_IV_Barcode) #select only tumor samples Tumor_Samples <- subset(log2expr, select = substr(colnames(log2expr),14,14) == '0') dim(Tumor_Samples) #subset the stages (I and IV) of tumor samples from Tumor_samples Stage_I <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_I_Barcode) dim(Stage_I) Stage_IV <- subset(Tumor_Samples, select = substr(colnames(Tumor_Samples),1,12) %in% Tumor_stage$Stage_IV_Barcode) dim(Stage_IV) #remove duplicate colnames Stage_I <- Stage_I[, !duplicated(colnames(Stage_I))] Stage_IV <- Stage_IV[, !duplicated(colnames(Stage_IV))] stage_I_IV <- cbind(Stage_I, Stage_IV) dim(stage_I_IV) head(stage_I_IV) #design matrix groups <- c(rep("Stage_I",277), rep("STage_IV",26)) group <- factor(groups) group #design matrix design <- model.matrix(~ 0 + group) design colnames(design)<- c("Stage_I","Stage_IV" ) tail(design) dim(design) library(limma) library(edgeR) y <- DGEList(stage_I_IV, remove.zeros=T) v <- voom(y, design) fit <- lmFit(v, design) cont.matrix <- makeContrasts(I_IV = Stage_I - Stage_IV, levels = design) cont.matrix contrast.fit <- contrasts.fit(fit, cont.matrix) fit2 <- eBayes(contrast.fit) fit2$nsamples <- ncol(stage_I_IV) fit2 fit2$nsamples summa.fit <- decideTests(fit2) summary(summa.fit) #this is how i get when i check for the DEG I_IV Down 0 NotSig 17762 Up 0 deg <- topTable(fit2, coef = "I_IV", p.value = 0.05, lfc = 2, number = nrow(v$E))
The topTable result is zero rows and zero columns
Your answers would really help me to clear my doubts,
Thank you all,
Ankita.