Bioconductor Forum

in files with read_tsv 1 2 3 4 5 6 7 8 9 10 transcripts missing from tx2gene: 135227 summarizing abundance summarizing counts summarizing length ``` Is this simply because most of the lncRNAs included in the FASTA files...are not annotated with gene names? Should I summarize to the transcript level with `txOut = TRUE

kallisto tximport ensembl

updated 7 months ago • Nicholas

Hi! I am trying to forge a BSgenome package for the most current Dmel6 genome using the FASTA file from Ensembl (BDGP6.32 v104). However, the package will not install and stops...details: call: .make_BSgenome_seqinfo(single_sequences, circ_seqs, genome, seqnames) error: sequence names found in file '/Users/Geo/BSgenome.Dmelanogaster.6.32.Rcheck/00LOCK-BSgenome.Dmelanogaster.6.32/00new...…

BSGenome Forge

updated 4.2 years ago • geo.vogler

Hello all, I've attempted to use GOseq, and while I know that, due to the gene length correction, it is more appropriate for RNAseq data than topGO, I like that topGO takes the topology of the GO graph...his great great tutorial (http://www-huber.embl.de/users/klaus/Teaching/DESeq2Predoc2014.html\#gene-ontology-enrichment-analysis), Bernd Klaus, uses the genefinder function to …

topgo goseq deseq2

updated 9.6 years ago • Ben Mansfeld

Hi, I am using the drawProteins package to draw protein domains as described nicely in several other places. My problem is that in some instances, Uniprot entries are missing CHAIN information, which is required for drawing the background chain in the plot. The CHAIN information...protein domains as described nicely in several other places. My problem is that in some instances, Uniprot entries …

drawProteins uniprot

updated 7.0 years ago • mblango

so I am getting stuck at the beginning... I have a different data format, a matrix of 63 genes (columns) and 12 samples (raws) with Ct values, already averaged over the technical repeats. I also have a header indicating...the gene names (so i set _header=true_). I load my text tab separated matrix with _read.delim_ and then i use _readCtData_ like this...s12")<br/> > sample=sa<…

HTqPCR readCtData Heidi Dvinge

updated 9.7 years ago • virginia.claudio

tximport to import the RSEM outputs. However, the RSEM outputs from this repository only include the lengths and estimated counts, and are missing the abundance information. Would it still be possible to use these data with

tximport deseq2 rsem

updated 5.4 years ago • le2336

reducedGFF <- unlist(grl, use.names=T) elementMetadata(reducedGFF)$gene_name <- rep(names(grl), elementNROWS(grl)) #Open the fasta file FASTA <- FaFile(FASTAfile) open(FASTA) #Add the GC numbers elementMetadata...reducedGFF)$widths <- width(reducedGFF) #Create a list of the ensembl_id/GC/length calc_GC_length &a…

goseq

updated 8.4 years ago • mictadlo

Hi, I am using tximport to assemble transcript level expression data from Salmon into gene-level expression data. I have read through the documentation but I am still unsure on how to interpret the "counts" and...abundance" matrix. As far as I understood: - Counts = best estimate of the original counts - Abundance = TPMs (at least when using Salmon...the counts matrix sum(!txi.scaled_tpm$c…

RNASeq tximport

updated 3.8 years ago • Al90

I have a reference database of barcode sequences with a species name/taxonomy associated with each sequence. I would like to remove any duplicated sequence but...reproducible example below shows how to do that with a very naïve approach (simply pasting the sequence and species names then looking for duplicates...). My question is : are there better ways to do that with any bioconductor...gt…

Biostrings

updated 2.9 years ago • Gilles

Dear all, I would like to compare the expression of different genes with each other in a single biological condition (for which I have technical replicates). Transcript abundances were...normalized=T" option of the counts function: dds\_norm=counts(dds, normalized=T) and then to compare genes with each other based on the dds\_norm counts. The objective is to select genes that are the most expres…

deseq2 normalization

updated 8.3 years ago • ctruntzer

I'm working with model species Arabidopsis thaliana and want to figure out the length of introns from the annotation file (or something similar?). Since only the exons are annotated I wanted to ask what a...I'm working with model species Arabidopsis thaliana and want to figure out the length of introns from the annotation file (or something similar?). Since only the exons are annotated I wanted t…

annotation arabidopsis thaliana Tutorial

updated 9.9 years ago • smurfblack

div class="preformatted">Hi, how is one supposed to go from ucsc known gene id to gene symbols. > cols(TxDb.Mmusculus.UCSC.mm9.knownGene) [1] "CDSID" "CDSNAME" "CDSCHROM" "CDSSTRAND" "CDSSTART" [6] "CDSEND...IPI" "PROSITE" "ACCNUM" "ALIAS" [11] "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" "PATH" [16] "PMID" "REFSEQ" "SYMBOL" "UNIG…

GO GO

updated 12.8 years ago • Ido M. Tamir

new_XString_from_CHARACTER", classname, x, start(solved_SEW), : zero or more than one input sequence</pre> Oddly, when I ran it a second time, the error changed a bit, but the same result: <pre> Error in .Call2("new_XString_from_CHARACTER...classname, x, start(solved_SEW), : zero or more than one input sequence In addition: Warning message: In nchar(str, "bytes") * 4L :…

edaseq normalization hg38 biomart org.db

updated 8.1 years ago • mark.ebbert

div class="preformatted"> Dear List, I used the following R code to extrace sequence information of a particular probeset for PM probes of Affymetrix SNP6 array. However, for 100 probesets I tested...there were only 2 unique PM sequences for each probeset. It appears that the PM sequences were not correctly catalogued. =============== library(pd.genomewidesnp.6...kao,"SELECT * from featureS…

updated 17.0 years ago • li lilingdu

you are looking at the pathway for glycolysis / gluconeogenesis pathstr = "path:hsa00010" # get all genes glist = get.genes.by.pathway(pathstr) # First fix what color you want for which node # Suppose, you want to color first enzyme...fgcols = color of text and border # bgcols = color of the rectangular area fgcols = rep("black", length(glist)) fgcols[1] = "black" bgcols = rep("#e6e6fa", lengt…

Pathways graph Pathways graph

updated 16.7 years ago • Tim Smith

p/354073/#354353) and then I got the answer that I should use deseq2 normalization with gene length adjustment. Is it possible to use deseq2 normalization with gene length adjustment like as FPKM gene length adjustment...I have 250 samples from healthy and disease states and I want to integrate gene expression with metabolic model. Indeed, I need within sample normalization (adjusting gene length…

deseq2 normalization

updated 7.1 years ago • Maryam

as opposed to microarrays and ESTs. Here it is: Ramskold D, Wang ET, Burge CB, Sandberg R (2009) An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data. PLoS Comput Biol 5(12) Hope that

updated 12.8 years ago • Alvaro J. González

of data described in the following. > Each block contains a human VALIDATED miRNA identifier and sequence > (Example: "hsa-miR-20a " "UAAAGUGCUUAUAGUGCAGGUAG") > followed by the identifier and 3'UTR sequence of ALL genes that...BLOCK_1 start > target-gene[1,1] 3'UTR sequence > target-gene[1,2] 3'UTR sequence > ......................................…

miRNA Biophysics GO Homo sapiens Biostrings biomaRt miRNA Biophysics GO Homo sapiens

updated 16.6 years ago • michael watson IAH-C

The number of reads for each sample is very variable, ranging between 100K and 8M reads. Only 5K genes present at least 3 reads in at least one condition (all replicates for one condition considered). This is due to the fact...that, especially at early stages, the proportion plant/microorganism is very low in the sequenced library. Overall, I think that the data are hardly suitable for different…

rnaseq normalization differential gene expression edger deseq2

updated 8.9 years ago • David Rengel

constructing the se object: se <- SummarizedExperiment(assays = list(counts = txi$counts, abundance = txi$abundance, length = txi$length), rowData = rownames(txi$counts), colData = metadata) So should I use the counts file for...the downstream DESeq2 analysis? > assays(se) List of length 3 names(3): counts abundance length

tximport DESeq2

updated 2.7 years ago • Yijing

data look like this: The columns are the samples: HG1,HG2,HG3,HG5,HG4,LG1,LG2,LG5 The rows are the genes: NM_000014, NM_000015... up to 18000 genes. The code I wrote is the following: ``` count_tab <- read.table("Human_islets_counts_Refseq_HG_vs_LG.csv...header = TRUE,row.names = 1,sep = ',') filter <- apply(count_tab, 1, function(x) length(x[x>5])>=2) filt…

RUVSeq software error edger

updated 6.6 years ago • ce.jim.san

I am working on RNA Seq data analysis to get differential gene expression between 2 conditions. I am using ballgown package on R, and successfully loaded the data into R. However, I do...after my progress: 1. Is it necessary to remove low variance transcripts while doing differential gene expression? And why? 2. Why do we need to remove low gene abundance & low variance transcripts?…

deseq2 edger normalization limma

updated 5.4 years ago • lakshmi9c

is to convert the probe sets > to gene names. Second is to convert the probe sets values into just one > summary gene expression value of the associated gene...for when doing downstream analyses). Alternatively, you could choose just one probe set for each gene, based on something like the most variability between your sample types, or the largest difference. There is a function...…

Annotation GO cdf probe affy convert ASSIGN Annotation GO cdf probe affy convert

updated 15.0 years ago • James W. MacDonald

10000L, 10000L, 10000L, 10000L, 10000L, 10000L, 10000L, 10000L) , NAMES = NULL , elementType = "integer" , elementMetadata = NULL , metadata = list() ) , strand = new("Rle" , values = structure(3L, .Label = c("+", "-", "*"), class = "factor...lengths = 10L …

rtracklayer

updated 10.5 years ago • liz.ingsimmons

normalize the data and then fit the linear method (creating an object 'fit') that can identify the most expressed genes. The data come from the example.lumi dataset contained in the package lumiBarnes. At the end of the analysis...the over-expressed genes have been identified with the hypergeometric test and the Illumina codes have been associated with the EntrezID...the term BP are used. The …

annotation lumi GOstats

updated 8.3 years ago • marongiu.luigi

div class="preformatted">Hi all, I need to find overlap between a text file (BED format) and a gene reference. The BED file contains sequence of different lengths, and I need to find all the sequences that lye inside the...gene (meaning overlapping percentage is 100%). I found findOverlaps function in GenomicRanges, but the parameter to control

updated 15.0 years ago • Duke

Hi, I'm using featureCounts from the Rsubread package. But I have a question about the gene length returned by featrureCounts. I've read the case study here: <a href="http://bioinf.wehi.edu.au/RNAseqCaseStudy/" target...from a RNAseq experiment, and I just used the rpkm() function in edgeR. This function takes the gene length as input, which I got by featureCounts. So I wonder __how feature…

featurecounts rnaseq rsubread

updated 9.3 years ago • niuyw

Hello guys, i have a list of KEGG IDs like K01344 and i want to convert them to uniprot format. How can i do this from here? I need to get all kegg ids and uniprot ids. ![KEGG REST API][1] [1]: /media/images/c271480c-eaba

KEGG UniProt.ws

updated 3.5 years ago • Recep

chr.loc = transcriptsBy (TxDb.Mmusculus.UCSC.mm10.knownGene, by = "gene") system.time ( prom.all <- getPromoterSeq(chr.loc, Mmusculus, upstream=2000, downstream=1000)) Error in loadFUN(x, seqname...seqname, ranges(gr), strand(gr), is_circular) 10: FUN(1:35[[27L]], ...) 9: lapply(seq_len(length(grl)), function(i) { gr <- grl[[i]] if (length(gr) == 0L) …

GenomicFeatures getPromoterSeq

updated 11.2 years ago • branislav misovic

there a maximum number of files that I can import using tximport? I'm receiving the error where the names must be the same length as the vector. I have looked at the other responses, and none have helped so far. Thank you. Laurie

r

updated 5.8 years ago • laurie.r.gray

I am trying to run CDHIT , on DNAStringSet object of DNA sequences to get non-redundent set of sequences <pre> library(Biostrings) library("BioSeqClass") ## flank is DNAStrinSet object...of equal length sequences <strong>seq = as.character(flank) </strong> ## Homolog reduction of whole-length sequence by cd-hit need cd-hit program...pre> I have downloaded latest…

CD-HIT BioStrings cdhit biostrings BioSeqClass

updated 8.5 years ago • vinod.acear

Dear all, I am performing differential gene expression analysis of single-nuclei RNA-seq data with DESeq2 and pseudobulk counts per cell type. 3 groups, 4 samples...per group. The nuclei vary to some extent concerning their mitochondrial RNA counts. So, DESeq2 naturally produces the most significant genes to be mitochondrial. Is there a way to include this as a covariate in the design

DESeq2 SingleCell

updated 5.1 years ago • Christian

expression analysis described in : * * *Stem cell transcriptome profiling via massive-scale mRNA sequencing* *Nicole Cloonan et al* *NATURE METhODS | VOL.5 NO.7 | JULY 2008 | 613* http://www.nature.com/nmeth/journal/v5/n7/abs/nmeth.1223.html...Analysis * To calculate differential expression of SQRL tag data we analyzed the normalized gene signals (tags per Refseq transcript, length- normalized) f…

limma limma

updated 8.6 years ago • Avinash S

Hello, im new in bioinformatics and trying to convert **uniprot** ids to **ensemble**. i could install biomaRt and read the datas by using read.csv. Now i need to convert the ids. But how can

biomaRt ensembldb UniProt.ws

updated 3.4 years ago • Recep

div class="preformatted">Hi Yiwen, In cases where you know the subset of genes that are not changin you can use limma as outlined in: http://genomebiology.com/2007/8/1/R2 Cheers, Alicia Message-ID: <c2a9eb528d6c3d44ad457c54ad0c7cd545a85c49...to try. However, I am just wondering, if there are people fitting LOESS on a subset of unchanged genes and any comment on that. Thanks in …

Normalization vsn limma Normalization vsn limma

updated 16.7 years ago • Alicia Oshlack

BLOCK_1 start > target-gene[1,1] 3'UTR sequence > target-gene[1,2] 3'UTR sequence > ............................................... > target-gene[1,n] 3'UTR sequence #BLOCK_1 > end > > VALIDATED miRNA...2] identifer miRNA[2] sequence #BLOCK_2 start > target-gene[1,1] 3'UTR sequence > target-gene[1,2] 3'UTR sequence &…

miRNA Biophysics GO Homo sapiens Biostrings biomaRt miRNA Biophysics GO Homo sapiens

updated 16.6 years ago • mauede@alice.it

Hello, I have an experiment and a control group to test differentially abundant taxa from amplicon sequencing data. The problem is that after filtering by adjusted p-value < 0.01 , some of the differentially...abundant OTUs are not present in both datasets, which is a problem for downstream analysis. I need to have the same taxa for both...groups. ¿WHy do I have significant differenti…

DESeq2

updated 3.3 years ago • Valentín

Hello, I am using two-color in SurePrint G3 Human Gene Expression 8x60K Microarray and wondering how many probes there is in one spot for one type of 60 nt sequence in length...log-ratio tends to be approximately equal to zero (yellow spot after scaning slide) when the same sequence from two different samples are labeled with two different types of fluorescent dyes and bonds to it's complement…

microarray

updated 7.0 years ago • w.abram

div class="preformatted">Hi Eric, Is CAGE the cap analysis of gene expression? Thanks! I think the error has to do with the chromosome naming since there are only chromosome X, 2L, 2R, 3L, 3R and...for any suggestions. > peaks = RangedData(IRanges(start = c(100, 500), end = c(300, + 600), names = c("peak1", "peak2")), space = c("NC_008253", + "NC_010468")) > peaks RangedData w…

ChIPpeakAnno ChIPpeakAnno

updated 15.8 years ago • Julie Zhu

positions_file$V2),end=as.numeric(positions_file$V2)))) head(refbase) A DNAStringSet instance of length 185 width seq names [1] 1 C 10 [2] 1 C 10 [3] 1 T 10 [4] 1 A …

dnastringset

updated 10.2 years ago • komal.rathi

Hi   I am using the following commands to run tximport txi <- tximport(files, type="salmon", tx2gene=tx2gene,ignoreTxVersion=TRUE,dropInfReps=TRUE) my tx2gene dataframe looks like this              tx\_id      &n…

tximport

updated 8.2 years ago • tanyabioinfo

chromosome location (org.Hs.egCHRLOC) and end position(using org.Hs.egCHRLOCEND) of a list of gene symbols. But I did not find which one mapped the gene length to its symbol. Should I subtract what I get in org.Hs.egCHRLOCEND...from org.Hs.egCHRLOC for each gene symbol to find the gene length or is there an easier way to find it for a long list of gene symbols. Thank you</div

updated 13.3 years ago • Fatemehsadat Seyednasrollah

T 7366 T>A 1140 T>C 8198 T>G 1830 My concern is, along with the position i want the gene names so is it possible to get the gene names from SigProfilerMatrixGenerator output ? and if not that whats the other...way to get the gene names from VCF files

BioCor StarBioTrek MatrixGenerics

updated 2.3 years ago • karmasstark

pattern of gene expression (it wouldn't make biological sense for many genes to go up, then down, then up again). With this in mind, I tried a fairly...edgeR to fit a model with only the batch effect ( i.e. model.matrix(~Batch) ). Then, I split the genes into 100 quantile bins by abundance and took the 20 lowest-dispersion genes from each bin, thus giving me a list of 2000...genes across the abun…

Normalization GO limma edgeR Normalization GO limma edgeR

updated 12.5 years ago • Ryan C. Thompson

header=TRUE, comment.char = "\#", sep="\\t") featCountsLengthData <- as.numeric(featCountsData\[,"Length"\]) names(featCountsLengthData) <- featCountsData$Geneid LenData <- names(geneLengthData) %in% names(genes) LenData &lt...geneLengthData\[LenData\] pwf <- nullp(DEgenes = genes, bias.data = LenData, plot.fit = TRUE) 2) Using Biomart: txsData <-…

goseq

updated 9.8 years ago • mrodrigues.fernanda

is to generate a text files containing a list of Homo-Sapiens validated miRNAs (microRNA- identifier, sequence) and relative 3'UTR regions (gene-identifier, 3'UTR-sequence). I realize this is just a matter of retrieving all known...some data from the paired gene. ex. name target chrom start end strand [1,] "hsa-miR-647" "ENST00000295228" "2" "120824263" "…

miRNA Homo sapiens miRNA Homo sapiens

updated 16.6 years ago • mauede@alice.it

Hi, I'm trying to use DESeq2 to analyse a dataset where I've generated abundances using Kallisto following the vignette "Analyzing RNA-seq data with DESeq2". I'm getting a consistent error with...Hi, I'm trying to use DESeq2 to analyse a dataset where I've generated abundances using Kallisto following the vignette "Analyzing RNA-seq data with DESeq2". I'm getting a consistent error with DESeq…

deseq2 tximport

updated 8.3 years ago • ben

or median tss.avg <- viewMeans(tss.cov) tss.med <- viewApply(tss.cov, median) tss.cov has names for each gene feature: > class(tss.cov) [1] "SimpleRleViewsList" attr(,"package") [1] "IRanges" > tss.cov[[1]][1:3] Views on a 230218...length Rle subject views: start end width [1] 99205 99404 200 [11.30 11.30 11.20 11.20 11.20 ...etc. [2] 142267 142466 …

Coverage convert Coverage convert

updated 11.7 years ago • Chris Seidel

Dear all,  In our study, we examined fungal communities using high-throughput amplicon sequencing (HTAS) of internal transcribed spacer 2 (ITS2) region in extracted total RNA from environmental samples (plants...and performed an analysis of variance (ANOVA) to investigate which OTU's significantly different in abundance among experimental factors after Bonferroni correction. We receive…

deseq2 microbiome fungi fungal microbiome

updated 7.7 years ago • david.gramaje

very new to Bioconductor and also to the field of Bioinformatics. However, I have a bunch of siRNA-sequences and I need to find the target genes (human genome). I got the advice to download the genome via AnnotationHub. I managed...Users/xyz/.AnnotationHub/12356") > genome > genome   A DNAStringSet instance of length 346     &nb…

sequence annotationhub biostrings

updated 10.4 years ago • antje.janosch

I'm running goseq on a sequence of bootstraps where a DE test is performed on a random selection of replicates. THis way, I have a constant list of...genes and a selection of significant genes which is slightly changing from bootstrap to bootstrap. goseq works nicely on...most of these bootstraps, but every now and then it crashes. Here is an example: <pre> > library(goseq) &g…

goseq

updated 10.9 years ago • M.Gierlinski

I have a strange error when I call ChIPQC. > 'names' attribute [9] must be the same length as the vector [2] Number in brackets vary depending on the number of samples. RNFC81...Computing metrics for 9 samples... list Bam file has 194 contigs Error in names(res) <- nms : 'names' attribute [9] must be the same length as the vector [2] Calls: …

chipqc

updated 5.0 years ago • ZheFrench

GenVisR, is it possible to output the mutation frequency data from waterfall function to data by gene instead of sample by default? I had read the GenVisR package manual and tried to resolve using writeData function within...fdef, mtable) { methods <- .findInheritedMethods(classes, fdef, mtable) if (length(methods) == 1L) return(methods[[1L]]) else…

GenVisR mutation frequency R cancer bioconductor

updated 6.3 years ago • sheneicechang

to realize that a lot of the 10X genomics tech, is 3' tagged RNA-seq, and thus does not have the length biases that would be present in Smart-Seq protocol (and thus passing the raw `txi$counts` as raw counts in data import...is done with bulk RNA-seq. - Thus, my understanding is that the correct steps for Smart-Seq/full length protocol would be to 1) import the data with the `tximport` setting …

tximport Seurat Smart-Seq Smart-Seq2

updated 5.5 years ago • lshepard

all, I know the general question of "should I summarize/average/etc probes that map to the same gene?" has been discussed many times before. But, I feel that it might be slightly different on the Illumina platform (at least...data relative to target summarized data, since you basically have the same number of distinct sequences. So, even though the probe names have changed, and there appear to b…

probe probe

updated 17.6 years ago • Cei Abreu-Goodger

data structure. Can anyone please help me in building a table that contains ontology wise mapping to Uniprot identifiers? I want the final output table to look something like this -- Uniprot GO_BP GO_CC GO_MF ABC123 GO:121 GO:122

updated 14.4 years ago • Sandeep Amberkar

div class="preformatted">Hi Hrishi, load the probe package for your array type, which contains the sequence of the 25mers. http://www.bioconductor.org/data/probes/Packages/hgu95av2probe_1.0.zip or http://www.bioconductor.org...library(hgu95av2probe) data(hgu95av2probe) summary(hgu95av2probe) my.out.seq <- hgu95av2probe$sequence[out.of.bounds] my.in.seq <- hgu95av2probe$sequence[…

probe probe

updated 21.1 years ago • Justin Borevitz

div class="preformatted">Hi Hrishi, load the probe package for your array type, which contains the sequence of the 25mers. http://www.bioconductor.org/data/probes/Packages/hgu95av2probe_1.0.zip or http://www.bioconductor.org...library(hgu95av2probe) data(hgu95av2probe) summary(hgu95av2probe) my.out.seq <- hgu95av2probe$sequence[out.of.bounds] my.in.seq <- hgu95av2probe$sequence[…

probe probe

updated 21.1 years ago • Justin Borevitz

tl;dr: The most differentially expressed genes from DESeq2 are genes with very low read counts for many of my samples, and extremely...into issues (I think?!) when I was running the code without pre-filtering, and plotting counts of my most significantly differentially expressed genes (ordered by adjusted p-value): ```R plt = plotCounts(deseq_ds, gene=idx, intgroup...most significant gene (padj 6…

RNASeq depmap DESeq2

updated 2.6 years ago • Ben

I have a `` GRangesList `` with multiple transcripts per gene (see example below). <pre> GRangesList object of length 9216: $ENSMUSG00000000001.4 GRanges object with 2 ranges and 4 metadata...245212 ENSMUSE00000628024.1 18 ENSMUSG00000000028.14</pre> I'd like to write the nucleotide sequences for all transcripts of a given gene to a fasta file grouped by gene id. Ex) <…

genomicranges genomicfeatures

updated 10.0 years ago • Jake