SCOPE Frequently Asked Questions
1. How does SCOPE work?
2. What scoring function does SCOPE use?
3. Where do I enter the width of motif I'm looking for or the number of occurrences or (insert your favorite parameter here)?
4. SCOPE can't find one of the genes in my gene group. What do I do?
5. I can't find my species. Can you help?
6. How do you know SCOPE works?
7. How does SCOPE's performance compare to that of MEME, Gibbs, or (insert your favorite motif finder here)?
8. What if I have some sequences in my group that crept in by mistake?
9. Why should I incorporate SCOPE into my regulon mining (or whatever) application?
10. How do I incorporate SCOPE into my regulon mining (or whatever) application?
11. Your output page is impossible to parse. How can I use it in further analyses?
12. I’m confused - which version of the motif do you consider to be SCOPE’s final output: sequence logo, consensus sequence or list of binding sites?
1. How does SCOPE work?
(This is a summary description, if you want more details check out the ‘About SCOPE’ and ‘Publications’ pages). SCOPE is based on three algorithms: BEAM finds
non-degenerate motifs, PRISM finds degenerate motifs, and
SPACER finds bipartite motifs. Results from all three algorithms are merged and the best scoring motifs are presented. For each putative cis-regulatory set, the actual sites present in the upstream sequences are then used to define a Position Weight Matrix.
2. What scoring function does SCOPE use?
In general, the component algorithms of SCOPE all look for sequences that are over-represented in your set of upstream regions relative to the genome as a whole. The genomic occurrences are looked up directly (not estimated). When choosing an analysis with the ‘fixed upstream regions’ selected, SCOPE also scores motifs based on differences in occurrence positions relative to the start site of the gene.
3. Where do I enter the width of motif I'm looking for or the number of occurrences or (insert your favorite parameter here)?
SCOPE will automatically determine that for you. SCOPE will find the most overrepresented motifs for you, regardless of length, degree of degeneracy and number of occurrences in the group.
4. SCOPE can't find one of the genes in my gene group. What do I do?
Ouch. Your gene is almost certainly present in our database, but it’s probably listed under a different name. We try to enable as many names as possible for each gene, but it’s not always possible to find every possible synonym. You might want to Google your organism name in quotes, “gene names” and/or synonyms to see if you can come up with a few alternative names for your gene. Also try species specific web sites. Alternatively, you can submit your sequences as FASTA files. If you choose this option, SCOPE will automatically choose an appropriate length of background sequences to ensure an accurate estimate of background frequencies.
5. I can't find my species. Can you help?
Please let us know if you would like to see any particular species added to SCOPE. We will be glad to add a new species if there is a demand for it. Also, the command-line version of SCOPE lets you upload your own genomes. The command line version is available by writing to Robert.H.Gross@dartmouth.edu. This version is free to academic sites and can be licensed for non-academic use.
6. How do you know SCOPE works?
We have validated SCOPE’s performance extensively on both synthetic datasets (where we planted a motif and looked to see if we could find it), as well as on real biological datasets, from organisms as diverse as yeast, E. Coli and Arabidopsis (See our publications).
7. How does SCOPE's performance compare to that of MEME, Gibbs, or (insert your favorite motif finder here)?
We tested the performance of SCOPE head-to-head on 78 experimentally determined regulons (from yeast, B. Subtilis, Drosophila and E. Coli) against ten different programs (MEME, Gibbs, AlignACE, Weeder, RSAT, YMF, Bioprospector, Gibbs MotifSampler, Improbizer and wConsensus, all run using their web page default settings). We evaluated the programs using a number of different criteria established by other groups in their own performance comparison papers. Uniquely for a motif-finder performance comparison, SCOPE’s performance was better by a substantial, and statistically significant margin. For more details see the SCOPE paper.
8. What if I have some sequences in my group that crept in by mistake?
SCOPE is very robust to noise. In an experiment, we planted up to 4 times as many random genes as real ones to each of the 33 regulons in the SCPD database. At 4x random (noisy genes), the performance SCOPE degraded by only 21% on average.
9. Why should I incorporate SCOPE into my regulon mining (or whatever) application?
Three big reasons. First, SCOPE is parameter-free and deterministic, meaning you get the same answer every time (helps when you’re debugging...). Second, SCOPE is more accurate than all other programs tested, when those programs are run using their defaults (see question 7). Third, SCOPE is extremely robust to the presence of noise (see question 8).
10. How do I incorporate SCOPE into my regulon mining (or whatever) application?
SCOPE is also available as a command-line program, to enable you to batch up large sets of runs or to incorporate into programs that perform regulon mining or other computations. The command line version is available by writing to Robert.H.Gross@dartmouth.edu. This version is free to academic sites and can be licensed for non-academic use. In addition, the source code is freely available to academic labs, so you can incorporate SCOPE directly into your application.
11. Your output page is impossible to parse. How can I use it in further analyses?
SCOPE’s web-based output is optimized for use in a live setting, and is not parser-friendly by design. You have two options: Have the results sent to you by email (a parser-friendly format) or use the command line interface which will generate an XML file as output.
12. I’m confused - which version of the motif do you consider to be SCOPE’s final output: sequence logo, consensus sequence or list of binding sites?
All three. Sequence logos, consensus sequences, and the list of known binding sites are all ways of representing the specificity of the transcription factor for DNA. Like the story of the blind men and the elephant, each of these abstract representations of the specificity captures some aspects of the true specificity but misses others. If the number of binding sites reported is very small, use the list itself. If there are a large number of binding sites, the Position Weight Matrix on which the sequence logo is based might be the best way to go (a sequence logo and a PWM contain the same information). For a principled evaluation of the best motif model, you might be interested in this paper: Osada, Zaslavsky and Singh, Bioinformatics, 2004.
|