Quick start

Enter motif to be searched and click Submit button.

Consensus search


Overview
Scans a motif consensus against a proteome to discover putative novel motif instances.

Steps
  • Enter a motif consensus.
  • Select the species to be searched from the list. See all species by clicking the more button in species field. Default: H.sapiens.
  • Specify your options in options form field.
  • Click Submit button to run a job.

Fig. 1. Search. The figure presents how to run a job. Firstly, enter he consensus (1). Next, select the species (2) and set the options(3). Finally click Submit button to run a job (4).
Input format
The motif regular expression is a character sequence that defines set of peptides, which are used to search against the proteome. They contain special and ordinary characters. Ordinary characters are single-letter amino acid codes like "A" for Alanine, "K" for Lysine or "L" for Leucine. These characters match themselves. Special characters affect how these characters are interpreted allowing for a complicated motif consensus to be queried. The special characters are based on regular expression syntax and are described below.

Special characters
Character Name Meaning
. or X dot Any amino acid allowed
[...] character class Amino acids listed are allowed
[^...] negated character class Amino acids listed are not allowed
{ min, max } specified range Matches min to max repetitions of the previous amino acid. Min required, max allowed
^ caret Matches the amino terminal
$ dollar Matches the carboxy terminal
| pipe Denotes alternation. For example (KL)|(LK), will match either KL or LK.
() brackets Group items into a single logical item. The bracket indicates the start and end of the group.

Examples
Motif Consensus
KEN box motif KEN
Cyclin-binding RxL motif [KR].L.{0,1}[LF]
C-terminal KDEL Golgi-to-ER retrieving signal [KRHQSAP][DENQT]EL$
N-terminal myristoylation site ^M{0,1}G[^EDRKHPFYW]..[STAGCN][^P]

Limits to motif degeneracy
The predicted number of instances expected to be returned from the query are calculated as (the number of amino acids in the search space) x (the probability of the motif occurring at a given position). The search space is the number of amino acids in the proteome above the disorder cutoff. The probability of the motif occurring at a given position is calculated using the background amino acid probabilities of the human proteome with an IUPred score > disorder cut-off. Queries expected to return more than 10,000 instances are not submitted to the server. If you wish to perform such a query please contact us directly.

Input options
Disorder cutoff
The IUPred program was used to predict intrinsically disordered regions. The disorder score is the mean IUPred scores of residues in a motif. The disorder cutoff limits the number of returned hits and only motifs with a disorder score greater than the defined cutoff will be returned.
A score range from 0 to 1, where 1 means complete disorder region. The score 0.4 or greater generally filters the majority of structured regions.

Flank length
Flank length for searched motifs. Pads the returned peptides with residues flanking the motif. Range from 0 to 20. The default flank length is 5 residues.


Fig. 2. Input page - Options. Disorder cut-off and flank parameters can be change by selecting a value from range. Disorder cut-off range from 0 to 1 and flank length range from 0 to 20.
Output
After a job is finished the main result table will be displayed. The consensus matches with overlapping feature and motif attribute annotations will be presented in the table on Instances page. Next, the results can be used to run evolutionary and functional enrichment analysis. The evolutionary analysis allows looks in depth for consensus matches conservation across different species and conservation of motif sequence context (Conservation page). Functional analysis performs enrichment analysis of functional annotations to indicate possible motif function, localisation or binding partner (Function page). Finally, the results can be filtered based on different information such as: accessibility, taxonomic range, interacting partners, subcellular localisation or functional annotations (Filters page).
Fig. 3. Navigation menu. 1) Instance page. 2) Conservation page - run evolutionary analysis. 3) Function page - run functional enrichment analysis. 4) Filters page - filter instances based on various information.



Instances


Overview
The hits of the queried motif consensus found in proteome are listed in a table and default sorted by conservation score. Each instance is annotated with peptide, motif attributes and feature information. Additionally, the functional annotations are showed if results were filtered by these annotations. Furthermore, the instances are flagged with warnings if occur in inaccessible regions.
The hits are grouped into digestible sets of instances and 100 hits are shown on each page. To see further hits, use page navigation in top right corner above table or enter directly the page number in empty box and press ENTER. The table can be sorted by each column by clicking on (ascending order) or (descending order) button above column name.

Fig. 4. Instances page. 1) Peptide annotations. 2) Motif attribute annotations. 3) Feature annotations. 4) Warnings. Instances with warnings have yellow background colour. 5) Sorters. 6) Download section. 7) Page navigation. 8) Filtering by warnings.

Annotations
Peptide annotations
Column DescriptionLink
VirusVirus species. The column is showed only if search was against Viruses proteomes.-
Protein NameProtein and gene name. Information about overlapping instances and warnings.UniProt
PeptideA motif sequence with the flanks. Flank are displayed as lowercase residues.ProViz
LengthMotif length.-
StartStart position of the motif in protein.-
EndStop position of the motif in protein.-

Motif attributes annotations
Accessibility is calculated using IUPred program and score for each instance is shown in Disorder score column. The disorder score is calculated as the mean of IUPred scores across residues of motif.
The relative conservation score is computed across the defined residues of motif as described for SLiMPrints motif discovery tool. The conservation score with the variance is shown in Conservation column(s). Several conservation scores can be calculated based on available alignments of differing taxonomic ranges. The presented taxonomic ranges depends on the query species and includes: QFO (Quest for Orthologs), Arthropoda, Viridiplantae, Amoebozoa, Fungi, Nematoda, Metazoa, Saccharomycetales and Viruses.
Column DescriptionLink
Disorder scoreMean IUPred score. High scoring peptides are less likely to be in a globular region.-
ConservationConservation score. Lower scores indicate more conserved peptides across the alignment.ProViz

Feature annotations
Overlapping feature annotations with peptide are grouped into 12 different types (see table below for details). A number in a feature column indicate how many features were found. To see more details, expand feature column by clicking on above column name. To hide these information, click again on the button.
All feature columns can be expanded/collapsed at once by clicking expand/collapse button above feature column names.
Hover over the feature to see start and stop position and more information of each feature. Additionally each feature is provided with distance information. For example, distance -2 means that annotated feature stops 2 residues before motif start position in the flanking region. No information about distance indicates that annotated feature directly overlaps motif consensus. To see more details about each feature click on feature of interest and you will be redirected to the source website.
Column DescriptionSource
DomainRegions with domains.Pfam and UniProt
StructureRegions that have structure solved by NMR or X-ray crystallography.PDB
Secondary StructureRegions that have been shown to form secondary structure.UniProt
MotifRegions with experimentally validated short linear motifs.ELM and UniProt
RegionRegions with experimental evidence for function.UniProt
SwitchCurated experimentally validated motif-based molecular switches.UniProt
ModificationRegions with sites of post-translational modifications.PhosphoSite, phospho.ELM and UniProt
TopologyRegion topology information.UniProt
IsoformSplice variants.UniProt
MutagenesisMutated residues which alter function.UniProt
SNPSingle nucleotide polymorphism with disease association and genotype information.dbSNP, 1000genomes and UniProt
OtherOther features of interest.UniProt

Fig. 5. Feature annotations. 1) Expand or collapse all feature columns. 2) Expand or collapse one feature column.

Functional annotations
Information about functional annotations is shown in table only if instances were filtered based on these annotations (see Filters section). Annotations are grouped into ontology and interaction annotations (see table below for details). Each annotation is linked to source website.
Column DescriptionSource
GO termsGene ontology terms for protein containing peptide.Gene Ontology
KeywordsUniProt keywords for protein containing peptide.UniProt Keywords
Interactors: ProteinsProteins experimentally shown to interact with the protein containing peptide.IntAct
Interactors: FamiliesProtein families interacting with the protein containing peptide.IntAct, UniProt Protein Families
Interactors: DomainsDomains found in proteins which interact with the protein containing peptide.IntAct, Pfam

Calculations
Several motif attributes are calculated per each motif consensus, however not all of them are shown in the result table because of the huge number of columns. The most important ones are displayed: disorder and conservation scores. All computed scores can be found in JSON/tab separated format.
Motif attributes:
Motif attribute DescriptionRange
Disorder scoreIUPred score computed as mean of IUPred scores across residues of motif consensus. Lower score, more globular region.
0-1
Conservation scoreRelative conservation score computed across the defined residues of motif consensus as described for SLiMPrints tool. Lower score, more conserved region.
0-1
Surface accessibility scoreProportion of the peptide that is accessible to water molecules in a solved structure of the region.
0-1
Anchor scoreAnchor score computed as mean of Anchor scores across residues of motif consensus. Lower score, higher propensity to fold upon binding.
0-1

Warnings
The instances are flagged to warn user if a given peptide is inaccessible to intracellular proteins. Instances with warnings are shown with yellow background colour in the result table and icon next to protein name. Hover over the icon to get more information about warning details.
Warnings are grouped into two types based on background colour in the result table: domains and other.
Domain warnings means that motif consensus overlaps domain(s).
Other warnings include information about:
Disorder:instances with disorder score ≥ 0.4
Surface accessibility:instances with surface accessibility percent score < 50% i.e. less than 50% of the peptide is accessible to water molecules in a solved structure of the region.
Localisation:instances with Gene Ontology terms which indicate extracellular localisation.
Topology:instances overlapping topology features which exclude intracellular regions.

Fig. 6. Warnings.


Filtering
The instances can be filtered based on these warnings. To filter results click on above the table in the right corner and switch on/off the slider next to warning type.

Fig. 7. Warnings. Hide/show instances with warnings. Advanced filters - redirect to Filters page to specify more filtering options.

The same filtering can be done on Filters page.

Download
The results can be saved as tab separated (tdt) or JSON format by clicking on the button in the top left corner of table. Information about ontologies, interactions and warnings are not provided in tdt format, however they can be easily find in JSON format (GOterms, Keywords, Interactors and Warnings field). All other information is available in both formats.

Tab separated format
Columns with description are shown in the table below. If a score could not be calculated for motif attribute, then -1 score occur in file. For feature annotations, each feature in column is separated by ";".
Column Description
InstanceId Unique instance identifier.
ProteinAcc UniProt protein accession.
ProteinName Protein name.
GeneName Protein gene name.
Hit Motif sequence with flanking regions. Flanks are represented as lowercase residues.
SeqStart Motif start position in protein.
SeqStop Motif stop position in protein.
IUPred Disorder score.
Anchor Anchor score.
SA Surface accessibility score.
Conservation <alignment> (score) Conservation score across <alignment>.
Conservation <alignment> (var) Conservation variance across <alignment>.
Domain Format: <name>|<id>|<start>|<stop>|<distance>
Motif Format: <name>|<id>|<start>|<stop>|<distance>
Modification Format: <name>|<enzymes>|<pmids>|<description>|<id>|<start>|<stop>|<distance>
Structure Format: <name>|<resolution>|<method>|<chain>|<start>|<stop>|<distance>
SNP Format: <name>|<variant>|<id>|<start>|<stop>|<distance>
Mutagenesis Format: <name>|<mutation>|<id>|<start>|<stop>|<distance>
Region Format: <name>|<start>|<stop>|<distance>
Topology Format: <name>|<start>|<stop>|<distance>
Secondary Structure Format: <name>|<start>|<stop>|<distance>
Isoform Format: <name>|<variant>|<start>|<stop>|<distance>
Switch Format: <type>|<subtype>|<mechanism>|<id>|<start>|<stop>|<distance>
Other Format: <name>|<start>|<stop>|<distance>

JSON format
Full list of fields for each instance is in the table below. If motif attribute score could not be computed, then -1 score is provided. The feature field(s) are present in results only if there is a one or more feature(s) overlapping motif consensus.
Field Type Description
instanceId Integer Unique instance identifier.
ProteinAcc String UniProt protein accession.
ProteinName String Protein name.
GeneName String Protein gene name.
Hit String Motif sequence with flanking regions. Flanks are represented as lowercase residues.
SeqStart Integer Motif start position in protein.
SeqStop Integer Motif stop position in protein.
IUPred Float Disorder score.
Anchor Float Anchor score.
SA Float Surface accessibility score.
Conservation <alignment> JSON Conservation score and variance across <alignment>. Format: {"score": <float>, "var": <float>}
GOterms List of JSON List of GO terms. Element: {"id": <GOterm id>, "name": <GOterm name>}.
Keywords List of JSON List of UniProt keywords. Element: {"id": <keyword id>, "name": <keyword name>}.
Interactors List of JSON List of interacting proteins with protein containing motif. Element: {"id": <UniProt protein accession>, "name": <UniProt protein name (gene name)>}.
Domain List of JSON Format: feature format, see below.
Motif List of JSON Format: feature format, see below.
Modification List of JSON Format: feature format, see below..
Structure List of JSON Format: feature format, see below.
SNP List of JSON Format: feature format, see below.
Mutagenesis List of JSON Format: feature format, see below.
Region List of JSON Format: feature format, see below.
Topology List of JSON Format: feature format, see below.
SecondaryStrcuture List of JSON Format: feature format, see below.
Isoform List of JSON Format: feature format, see below.
Switch List of JSON Format: feature format, see below.
Other List of JSON Format: feature format, see below.
WarningsList of JSONList of warnings. Element: {"name": <warning category>, "reason": <warning reason>}

Each feature annotation in feature fields is represented as JSON using following feature format:
Field Type Description
name String Feature name.
url String Link to source data.
description JSON Format: {"start": <feature start position>, "stop": <feature stop position>, "distance": <distance to motif consensus>, "description": <other specific information in JSON format>}


Conservation


Overview
There are two main evolutionary sections: flank conservation and taxonomic range for each available clad. In each section the results are provided with peptide, motif attributes and feature annotations. From feature annotations only overlapping domains and motifs are displayed. Additionally, the specific information is provided for each section.
The hits are grouped into digestible sets of instances and 100 hits are shown on each page. To see further hits, use page navigation in top right corner above table or enter directly the page number in empty box and press ENTER. The table can be sorted by each column by clicking on (ascending order) or (descending order) button above column name. The custom view of table columns can be set in sidebar. To change current view or clad select one from navigation menu above the table or in sidebar.
Fig. 8 Conservation page. 1) Navigation menu. 2) Sidebar. 3) Sorters. 4) Page navigation.
Flank conservation
Instances are annotated with the relative conservation scores for each residue in motif consensus and flanking regions (always from -10 to +10 residues). The conservation of each residue is represented with colour intensity i.e. more intense red colour means more conserved residue. Several scores are computed to compare conservation of motif sequence to flanking regions.

Flank conservation specific columns in the result table:
Column Description
Shown default
Con Score Combined Conservation score combined. It is sum of conservation score and conservation variation.
+
Sig conserved residues defined positions Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05).
-
Sig conserved residues Flanks Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05).
-
Sig conserved residues Ratio The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks.
+
L-10:L-1 Conservation scores and residues for N-terminal flank.
+
P<motif position> Conservation scores and residues for motif consensus.
+
R1:R10 Conservation scores and residues for C-terminal flank.
+
Fig. 9 Conservation - Flank conservation. Specific columns for flank conservations view.


Taxonomic range
Instances are annotated with the information about the motif consensus conservation across a species from the alignment. Usually, the subset of species is default shown in the result table. To see full list of species or customize table columns, use sidebar.

Taxonomic range specific columns in the result table:
Column Description
Shown default
Con Score CombinedConservation score combined is the sum of conservation score and conservation variation.
+
Conserved Counter Number of species in which the motif consensus is present at the same position as the query species motif.
+
Species columns Shows if the motif is present (C) or absent (N) at the same position as the query species motif in each species of the select clade. If no data is available (i.e. there is no protein in the alignment for the species) an "X" is supplied.
+
Fig. 10 Conservation - Taxonomic range. Specific columns for taxonomic range view.


Sidebar
The Options panel is located on the left and there are three sections: Views, Columns and Save.

Views
In Views section a view can be changed. Switch from current view to Taxonomic range or Flank conservation section and specify the alignment. You can do the same from navigation menu above the result table.
Columns
In Columns section the columns can be switched on/off to hide/show them in the result table. You can see here, full list of species available in selected alignment. To add column in the result table, just tick the checkbox . The table will be updated automatically.
Save
In Save section the results can be downloaded as tab separated (tdt) or JSON format. See Download for more details.

Fig. 11. Sidebar. There are three sections: Views, Columns and Save to change view or download results.


Download
The results can be saved as tab separated (tdt) or JSON format. To download results, use sidebar (Save section).

Tab separated format
Columns with description are shown in the table below. If a score could not be calculated for motif attribute, then -1 score occur in file.
Column Description
InstanceId Unique instance identifier.
ProteinAcc UniProt protein accession.
ProteinName Protein name with gene name.
Hit Motif sequence with flanking regions. Flanks are represented as lowercase residues.
SeqStart Motif start position in protein.
SeqStop Motif stop position in protein.
IUPred Disorder score.
Domain Domain names separated by ";".
Motif Motif classes separated by ";".
<alignment> conservation score Conservation score across <alignment>.
<alignment> conservation var Conservation variance across <alignment>.
<alignment> conservation combined Conservation score combined. Sum of conservation score and variance across <alignment>.
conserved_counter Number of conserved species across <alignment>.
<species> C, N or X. C - the motif consensus is present at the same position as query species (conserved). N - the motif consensus is missing at the same position as query species (non-conserved). X - species is not present at the alignment (missing).
mean_flanks Mean of relative conservation scores across residues of flank regions.
var_flanks Variance of relative conservation scores across residues of flank regions.
Sig conserved residues defined positions Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05).
Sig conserved residues Flanks Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05).
Sig conserved residues Ratio The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks.
L-10:L1 The relative conservation scores for residues in N-termini flank.
L<position> The relative conservation scores for residues in motif consensus.
R1:R10 The relative conservation scores for residues in C-termini flank.
Alignment Hyperlink to ProViz visualisation tool.

JSON format
Full list of fields for each instance is in the table below. If motif attribute score could not be computed, then -1 score is provided.
Field Type Description
instanceId Integer Unique instance identifier.
ProteinAcc String UniProt protein accession.
ProteinName String Protein name.
GeneName String Protein gene name.
Hit String Motif sequence with flanking regions. Flanks are represented as lowercase residues.
SeqStart Integer Motif start position in protein.
SeqStop Integer Motif stop position in protein.
IUPred Float Disorder score.
ConservationScore Float Conservation score.
ConservationVar Float Conservation variance.
ConservationScoreCombined Float Conservation score combined. Sum of conservation score and conservation variance.
Conservation_Scores JSON Conservation score and variance across <alignment>. Format: {"<searchdb>": {"score": <float>, "var": <float>}}.
Domain List of JSON Format: feature format, see description..
Motif List of JSON Format: feature format, see description..
mean_flanks Float Mean of relative conservation scores across residues of flank regions.
var_flanks Float Variance of relative conservation scores across residues of flank regions.
flank_sig Float Proportion of residues in the flanking positions of a motif that are significantly conserved (p > 0.05).
motif_sig Float Proportion of residues in the defined positions of a motif that are significantly conserved (p > 0.05).
ratio_sig Float The ratio of Sig conserved residues defined positions to Sig conserved residues Flanks.
motif_sig_pos List of Integer The defined positions of a motif consensus.
conserved_counter Integer Number of conserved species i.e. motif consensus is at the same position as query species.
Conservation JSON Species conservation. Format: {"species_code": Boolean}. True - motif consensus is present at the same position, False - motif consensus is missing at the same position.
flank_residues JSON Conservation for each residue in flanking regions. Format: {"<flank position>": {"aa": <residue>, "score": <relative conservation score>}}.
peptide_residues JSON Conservation for each residue in motif consensus. Format: {"<motif position>": {"aa": <residue>, "score": <relative conservation score>}}.
proviz_link String Link to ProViz visualisation tool.



Function


Overview
Enrichment analysis of Gene Ontology terms, UniProt keywords and interaction data are performed using three different approaches:
Approach Description Default
Motif search space correction Enrichment analysis corrected for motif search space, i.e. search space is limited to disordered regions of proteome. See details.
+
Based on conservation Enrichment analysis based on conservation scores as the ranking criteria. See details.
-
Classical Classical enrichment analysis based on hypergeometric distribution. See details.
+
All approaches accounts for evolutionary relationship of proteins by grouping similar proteins based on sequence and function similarity.

Data management
Enrichment results are divided into two categories: Ontology and Interaction.
Ontology section provides information about controlled vocabulary derived from the GO project and UniProt keywords. They are classified in 5 categories:
  • TOP - the most significant 20 terms from ontology section.
  • Biological process
  • Molecular function
  • Localisation - known as cellular component.
  • Disease

Interaction section provides information about interacting proteins, protein families or domains. Interaction data is retrieved from IntAct database. They are divided into 3 categories:
  • Domain - interacting domains found in interacting proteins.
  • Family - interacting proteins grouped into protein families.
  • Protein - interacting proteins.

Result representation
The hits are grouped into digestible sets of functional annotations and 50 hits are shown on each page. To see further hits, use page navigation in top right corner above table or enter directly the page number in empty box and press ENTER. The table can be sorted by each column by clicking on (ascending order) or (descending order) button above column name. The custom view of table columns can be set in sidebar. To change current category, select one from navigation menu above the table or in sidebar.

The terms are presented in the result table with background colour. Following variants are possible:
Colour Meaning
GreenSignificant terms with adjusted p-values < 1e-4.
GreyEnriched terms i.e. with enrichment score (E) > 1.
BlueDepleted terms i.e. with enrichment score (E) < 1.
Light yellowWarning. Term is flagged with repeat or cluster flag.
Dark yellowWarning. Term is flagged with repeat and cluster flag.


Result table
Enrichment result table has numerous columns describing the relevant data from the enrichment analysis. Most of columns are default hidden. Results from different approaches used in enrichment analysis are grouped together and by default result from the enrichment analysis based on classical approach are hidden. To customize the table content use Columns section in sidebar.
Columns in result table:
Column Description
Shown default
Category Term category.
-
ID Unique term identifier.
+
Name Functional annotation name.
+
# Number of consensus matches that map to this term.
+
# motifs Number of consensus matches in dataset.
-
# residues Number of disordered residues* that map to this term.
-
# residues proteome Number of disordered residues* in whole proteome.
-
# Proteome Number of proteins in proteome that map to this term.
+
Enrichment Enrichment (E). If (E) > 1 then term is enriched, otherwise is depleted.
+
P-value Enrichment significance calculated using Hypergeometric test. Lower scores, more enriched/depleted term.
+
Adj pval Adjusted p-values. P-values after BH correction.
+
<alignment> P-value Enrichment significance calculated based on conservation.
+
*Disordered residue - residue with IUPred score > disorder cut-off.

Filtering
The results in table can be filtered based on enrichment score (E) i.e. depleted/enriched, warnings and adjusted p-value score.
To limit terms to only enriched, depleted and/or without any warnings click on above the table in the right corner and switch on/off the slider next to warning type. By default, the depleted terms are hidden.
The p-value cut-off (one of: 0.001, 0.01 or 0.05) can be selected from the panel above the table. The cut-off is applied to adjusted p-values. If the results from enrichment analysis based on conservation are present in the table, then the conservation cut-off can be chosen - i.e. return terms which have minimum one of p-values from all available alignments less than selected cut-off.

Fig. 12. Function page. 1) Navigation menu. 2) Sidebar. 3) Cut-offs to limit results in table. 4) Filter results to see enriched, depleted or without warnings terms. 5) Page navigation. 6) Search for terms in table. 7) Filter consensus matches based on selected terms (see Filters for details). 8) Result table.

Approaches
Enrichment analysis with motif search space correction
Enrichment analysis with motif search space correction is improvement of classical approach. It accounts for search space, i.e. the analysis is limited to disorder regions of proteome, in the same way like search space for motif consensus searches (disorder cut-off). The enrichment calculations try to answer the question: what is a probability that more than m consensus matches of all consensus matches in the dataset (M) belong to a given functional term shared by n of N disordered residues in the entire proteome? Disordered residues are residues with score ≥ than disorder cut-off.
The enrichment analysis uses Hypergeometric test to define significance of terms using following equations:
$$Enrichment = E ={(m/n) \over (M/n)}$$ $$P(x > m) = f(m,N,n,M)=\sum_{i=m+1}^{min(M,n)}{\binom{M}{i}\binom{N-M}{n-i} \over \binom{N}{n}}$$ where,
  • m - is the number of consensus matches that map to a given term in dataset
  • n - is the number of residues that map to a given term and with IUPred score ≥ disorder cut-off in entire proteome
  • M - is the number of consensus matches in dataset
  • N - is the number of residues with IUPred score ≥ disorder cut-off in whole proteome.

Enrichment (E) greater than 1 indicates that term is overrepresented, otherwise it is underrepresented. If the term is depleted than the significance (p-value) is calculated as: 1 - P(x > m).
P-values are corrected for multiple hypothesis testing using Benjamini-Hochberg correction.

For example, if the dataset contains 10 motif consensus matches and 5 of them map to mitosis term, the background set for H.sapiens will be 3345087 - number of residues with IUPred score above disorder cut-off (0.4), and there will be 58995 residues annotated to mitosis term in whole proteome. Hypergeometric test take into accounts all four values, and the p-value for that term will be equal to: 5.885e-09.

Enrichment analysis based on conservation
Enrichment analysis based on conservation uses relative conservation scores as ranking criteria. The conservation scores of consensus matches are assigned to functional annotations. For each term, the conservation scores assigned to a given term are compared to remaining conservation scores i.e. which are present in results, but not assigned to that term, using the Mann Whitney U test. The functional annotations assigned to more conserved consensus matches will be more related to biological function of motif compared to these functional annotations which are assigned to consensus matches with randomly distributed conservation scores. The enrichment analysis is performed on all available alignments for conservation data. By default, this analysis is not performed. To run them, set Conservation to True in Options section in sidebar and click Search button.

Enrichment analysis based on classical approach
Enrichment analysis based on classical approach use hypergeometric distribution to identify enriched functional annotations and try to answer the question: what is a probability that more than m proteins containing consensus matches of all proteins containing consensus matches in the dataset (M) belong to a given functional term compared to a background distribution, where the background distribution is proportion of proteins with a given term to all proteins in the entire proteome.
Enrichment analysis are performed using following equations:
$$Enrichment = E ={(m/n) \over (M/n)}$$ $$P(x > m) = f(m,N,n,M)=\sum_{i=m+1}^{min(M,n)}{\binom{M}{i}\binom{N-M}{n-i} \over \binom{N}{n}}$$ where,
  • m - is the number of proteins containing consensus matches that map to a given term in dataset
  • n - is the number of proteins that map to a given term in entire proteome
  • M - is the number of proteins containing consensus matches in dataset
  • N - is the number of proteins in whole proteome.
If a given consensus match occurs more than once in protein than the number of proteins containing consensus matches is equal to 1.
Enrichment (E) greater than 1 indicates that term is overrepresented, otherwise it is underrepresented. If the term is depleted than the significance (p-value) is calculated as: 1 - P(x > m).
P-values are corrected for multiple hypothesis testing using Benjamini-Hochberg correction.

Benjamini-Hochberg correction for multiple hypothesis testing
The correction is applied in each category (i.e. Biological process, Molecular function, Localisation etc.) and it is calculated as: $${p*n} \over {i}$$ where,
  • p - is p-value
  • n - is the number of terms in category
  • i - is ith term ranked according to the p-value in category


Clustering based on evolutionary relationship
The enrichment analysis are corrected for evolutionary relationship based on sequence and function similarity. Proteins contaning consensus matches can be grouped together based on different UniProt clusters (UniRef50, UniRef90, UniRef100), UniProt protein families or corrected cluster. Corrected cluster combines UniRef50 and UniProt protein families' clusters. There is also options to not cluster the data. The default clusters is set to UniRef50. The clustering options can be change in sidebar in Options section.

Calculations with clustering options
Each variable described in equations in calculation section is normalised based on chosen cluster options. Proteins containing matches consensus are grouped together and normalised values of disordered residues and consensus matches are calculated per each cluster. See details of calculation with clustering in Supplementary materials in paper.
In enrichment analysis based on conservation the proteins containing consensus matches are grouped together in clusters and the best conservation score (i.e. the lowest) is chosen as representative for a given cluster.


Warnings
The functional annotations are flagged to warn user if a given term can be overestimated. The terms with warnings are shown with yellow background in the result table and icon next to adjusted p-value value. To see details about warning, hover over the icon. There are two types of warnings: repeat and cluster flag.

Repeat flag
The term can be overestimated when consensus matches occur multiple times in the same protein due to repeated regions in that protein. The term is flagged if number of repeated consensus matches is significantly greater than expected (i.e. p < 0.001).

Cluster flag
The term can be overestimated when consensus matches occur in related proteins, but were not clustered based on evolutionary relationship. This flag is only calculated when enrichment analysis is performed on UniRef50 clusters. The term is flagged if ratio of number of corrected clusters assigned to a term to UniRef50 clusters assigned to this term is ≤ 0.5.

Fig. 13. Function page - warnings suggests overestimated functional annotations.

Filtering
The functional annotations in result table can be used to filter instances, i.e. show instances which map to the selected terms from the result table. For details and how to filter instances see Filters section.


Sidebar
The Options panel is located on the left. There are four sections: Views, Options, Columns and Save.

Views section
A view can be changed to see enriched terms from selected category.

Columns section
The columns can be switched on/off by ticking the checkbox next to column name. The table will be updated automatically.

Options section
A new enrichment analysis can be run i.e. perform enrichment analysis based on conservation (set Conservation to True) or change the clustering options (choose clusters from Cluster option). Specify your own options and press Search button to rerun analysis.
Parameters:
  • Cluster
    The analysis can use different UniProt clusters (UniRef50, UniRef90, UniRef100), UniRef protein families cluster (ProteinFamily) and combine cluster (FamilyUniref (Mix)) or no clustering (None) to group proteins based on their relationships. The default cluster is set to UniRef50. See clustering details here.
  • Conservation
    Conservation option enables to compute enrichment significance based on conservation scores. This is not run by default. Set Conservation to True if you want to run this enrichment analysis and see outcome in the result table.
  • P-value
    P-value cutoff limits number of returned hits.

Save section
The results can be downloaded as tab separated format (tdt) or JSON format.

Fig. 14. Function page - sidebar. There are four sections: Views (1), Options (2), Columns (3) and Save (4) to change current view, rerun job with different options or download results.


Download
The results can be saved as tab separated (tdt) or JSON format. To download results, use sidebar (Save section).

Tab separated format
Columns with description are shown in the table below. Only terms from a current category will be saved. However, if your current category is TOP then you will find results from every category in the file.
Column DescriptionApproach(s) which use a given value in calculations/annotations
CategoryTerm category.
all
IDUnique term identifier.
all
NameFunctional annotation name.
all
No. of motif instances mapped to term in dataset (m)Number of consensus matches mapped to a given term in dataset (m).
search space correction
No. of motif instances in dataset (M)Number of consensus matches in dataset (M).
search space correction
No. of disordered residues mapped to term (n)Number of disordered residues mapped to a given term in proteome (n).
search space correction
No. of disordered residues in proteome (N)Number of disordered residues in proteome (N).
search space correction
EnrichmentEnrichment score (E).
search space correction
PvalueEnrichment significance.
search space correction
Adj pvalueCorrected p-value for multiple hypothesis testing.
search space correction
No. of proteins mapped to term in datasetNumber of proteins mapped to a given term in dataset.
classical
No. of proteins mapped to term in proteomeNumber of proteins mapped to a given term in proteome.
classical
No. of proteins in datasetNumber of proteins in the dataset.
classical
No. of proteins in proteomeNumber of proteins in the entire proteome.
classical
Enrichment (Proteins)Enrichment score (E).
classical
Pvalue (Proteins)Enrichment significance.
classical
Adj pvalue (Proteins)Corrected p-value for multiple hypothesis testing.
classical
Repeat flagWarning. Overestimation of term (True/False).
all
Cluster flagWarning. Overestimation of term (True/False).
all
Repeat flag (expected)Expected number of instances to be seen by chance mapped to a given a term.
all
Repeat flag (expected p-value)Significance of repeat flag.
all
Cluster flag (ratio)Significance of cluster flag.
all
<alignment>Enrichment significance.
conservation

JSON format
Full list of fields for each term is in the table below. The result in JSON format are grouped by each category i.e. Biological process, Molecular function etc.
Field Type Description
category String Term category.
id String Unique term identifier.
name String Functional annotation name.
count Float Number of consensus matches mapped to a given term in dataset.
M Float Number of consensus matches in dataset.
n Float Number of disordered residues mapped to a given term in proteome.
N Float Number of disordered residues in proteome.
enrichment Float Enrichment score (E) for motif search space correction approach.
pval String Enrichment significance for motif search space correction.
pvalBH String Adjusted p-value for multiple hypothesis testing for motif search space correction.
proteinTerm Float Number of proteins mapped to a given term in dataset.
occurrence Float Number of proteins mapped to a given term in proteome.
proteinCount Float Number of proteins in the dataset.
proteinBackgroundCount Float Number of proteins in the proteome.
flag Boolean Repeat flag.
expected Float Expected number of instances to be seen by chance mapped to a given a term.
exp_pval Float Significance of repeat flag.
flag2 Boolean Cluster flag.
countUniMix Float Significance of cluster flag.
url String Link to source data.



Filters


Overview
Consensus matches can be filtered based on containing-protein, interactors, taxonomic range, accessibility, localisation and functional annotations. The filters are grouped into 6 following categories:
Filter group Description
Hub protein Shared functional annotations and interactors of motif binding-partner.
Hub domain Interacting domains.
Annotation Subcellular localisation and enriched functional annotations in the dataset.
Evolution Taxonomic range. Conservation across different clads/species.
Accessions Containing protein and ontology or interacting annotations.
Accessbility Accessibility to intracelullar proteins.

To filter instances, choose one of filtering options from navigation menu and follow the instruction provided on page. All filtering sections have Description header with [-] or [+] sign. The short description of filtering can be expanded or collapsed by clicking on these signs. After specifying your filters click Filter button and you will be redirected to Instances page to view filtered instances. You can save your filter by clicking on Add button. The filter will be added to all filters and you can specify another filtering options. See details below.
Fig. 15 Filters page - example. 1) Navigation menu. 2) Description header. 3) Add/Filter button.


Using multiple filters
The instances can be filtered by multiple filters. To specify your filters, use Add button or filter instances with Filter button and come back to another filter and click again Filter button. The next filter will be added and two filters will be used.
The list of active filters is shown just below the navigation menu as Filters header with [-] or [+] sign on Filters and Instances pages. You can see list of your filters by expanding the filters informations (click on [+] sign). The list of filter names will be shown. To see details about each filter, click details button next to filter name. Each filter can be removed by clicking on icon next to filter name. After removing filters, click UPDATE button and the consensus matches will be filtered with updated filters. If you remove all filters, the results will be updated automatically. If you do not want to make any changes in filters, click on Instances tab from navigation to see instances that meet multiple critiera.
Fig. 16 Filters page - filters panel. 1) Filters header - expanded. 2) Details about filter (to see, click on details button). 3) Remove filter (click on icon). 4) Update results with new filtering options (click UPDATE button - only after removing filter(s)). 5) View consensus matches that meet specified criteria.

Adding filters using Add button
Add button allows to add filter to list of filters. This allows to filter instances based on more than one filter options.
To add new criteria specify your new filter and click Add button. The confirmation about adding the filter will be shown and the Filters view will be updated.
Fig. 17 Filters page - adding a new filter. 1) Specify your filter options. 2) Add filter (click on Add button). 3) Confirmation of added filter. 4) Updated Filters view.


Instances page after filtering
If instances are filtered, the information about filtering will be shown above the navigation menu on each page. The filters can be customized on Instances page in Filters view above the result table or on Filters page in Filters view below navigation menu as described above.
Important! After filtering, the results on Conservation and Function page will be shown only for instances that meet that criteria. For example functional enrichment analysis will be recalculated for new motif dataset and conservation information will be limited to instances after filtering.
Fig. 18 Instances page after filtering. 1) Information about filtering. 2) Filters view.

Hub protein
There are two options to filter instances based on annotations of known motif binding partner. The instances can be filtered based on shared functional annotations or presence as interactors for binding partner.

Shared functional annotations
The functional annotations of binding partner can be used to filter instances in dataset to limit consensus matches to these which share the same function with binding partner.
Steps:
  1. Define the binding partner in the input box.
  2. Click search button to see functional annotations of binding partner.
  3. Select the functional annotations which you want to use as filters from the table.
  4. Click Filter button to view instances which share the same annotations or Add button to save filter.
Step 1. The binding partner can be provided as UniProt protein accession or protein name. If you start typing the protein name, the list of possible proteins will show below the box. Click on protein of your choice from the list to define the binding partner.
Fig. 19 Define binding partner. 1) Start typing the protein name. 2) Choose protein from the list below. 3) Click search button.
The functional annotations of binding partner are provided with information how likely a given term is shared by any two proteins in the proteome. The probabilities are computed based on UniRef50 clusters (Sig (UniRef50) column) and without any clustering (Sig (NoClustering) column). The user can limit the number of returned functional annotations to these more specific by setting cut-off to lower score. The lower cut-off excludes general annotations such as: metabolic process, single-organism process or localisation. The cut-off can be chosen from one of provided: 1e-5, 0.01 or 0.1 or be defined by user. To define your own cut-off, enter your cut-off in empty box next to cut-offs and press ENTER. The table will be updated. The functional annotations are derived from Gene Ontology project.
Fig. 20 Filtering. 1) Defined binding partner. 2) Change cut-off (optional). 3) Select the annotations from table. 4) Click Filter button or Add button to save filter.

Interactors
The interactors of binding partner can be used to limit instances to these which occur in interacting proteins of binding partner and interact with the binding partner.
Steps:
  1. Define the binding partner in the input box.
  2. Click search button to see interactors of binding partner.
  3. Select the interacting proteins which you want to use as filters from the table.
  4. Click Filter button to view instances which occur in interacting proteins of binding partner and interact with defined binding partner or click Add button to save filter.
Step 1. The binding partner can be provided as UniProt protein accession or protein name. If you start typing the protein name, the list of possible proteins will show below the box. Click on protein of your choice from the list to define the binding partner.
Fig. 21 Define binding partner. 1) Start typing the protein name. 2) Choose protein from the list below. 3) Click search button.
Interacting proteins of binding partner are provided with information how likely a given interactor is shared by any two proteins in the proteome. The probabilities are computed based on UniRef50 clusters (Sig (UniRef50) column) and without any clustering (Sig (NoClustering) column). The user can limit the number of returned interactors to these more specific by setting cut-off to lower score. The cut-off can be chosen from one of provided: 1e-5, 0.01 or 0.1 or be defined by user. To define your own cut-off, enter your cut-off in empty box next to cut-offs and press ENTER. The table will be updated. Interacting proteins are derived from IntAct database.
Fig. 22 Filtering. 1) Defined binding partner. 2) Change cut-off (optional). 3) Select interactors from table. 4) Click Filter button, or Add button to save filter.

Hub domain
Interacting domains can be used to limit instances to these which could interact with specified binding domains, i.e. domains which occur in known interacting proteins for a given consensus match.
Steps:
  1. Search for domains using input box.
  2. Select the interacting domain(s) which you want to use as filters from the table.
  3. Click Filter button to view instances which interact with selected binding domains, or click Add button if you want to add this filter and specify another one.
Step 1. Search for domains. Start typing domain name or shortcode in the input box and possible domains will show in the table below. The table will be updated automatically whenever you start typing in the input box.
Fig. 23 Filtering. 1) Search for domains. 2) Choose domain(s) from the table. 3) Click Filter button to filter instances or Add button to save filter options.

Annotation
There are two options to filter consensus matches based on motif specific annotations. The instances can be filtered based on possible localisation or ontology and interaction annotations for motif set.

Localisation
The protein cellular component annotations can be used to limit number of instances occurring in (or outside) specific localisations. All possible subcellular localisations for consensus matches in dataset are listed in table and each annotation is provided with information how many instances occur in specific cellular component (# column).
Steps:
  1. Select the localisation(s) from table.
  2. Click on Filter in button to view instances which occur in defined localisations or click on Filter out button to view instances which occur outside defined localisations. Click one of Add buttons to save filter.
The table can be searched by using search box above the table. Start typing the localisation and the table will be updated automatically.
Fig. 24 Filtering. 1) Search table (optional). 2) Select localisation(s). 3) Click one of Filter buttons to filter instances or one of Add button to save filter.

Function
The ontology and interaction terms can be used to filter the consensus matches which are assigned to selected terms by user. All possible functional annotations for consensus matches in motif set are listed in the tables. Each term is provided with enrichment scores and significance from enrichment analysis.
Steps:
  1. Select the term(s) from the table.
  2. Click on Filter button to view instances which are mapped to selected terms, or Add button to save filter.
The table with functional annotations can be searched by using search box above the table. Start typing name of searched terms and the table will be updated automatically. The content in table can be limited to only enriched, depleted or without warning terms as described here.
The same filtering can be performed when you are on Function page.
Fig. 25 Filtering. 1) Search table (optional). 2) Select annotation(s). 3) Click on Filter button, or Add button to save filter.

Evolution
Taxonomic range can be used to limit number of instances to these conserved outside (or inside) specific clad or species. All possible clads (and species) are listed in the table.
Steps:
  1. Specify the alignment.
  2. Specify if you want to filter instances based on conservation inside or outside selected species or clad.
  3. Specify the species/clad to be used as filter by clicking on one from the table.
  4. Click Filter button to view instances which are conserved inside or outside selected species or clad, or Add button to save filter.
The table with taxonomic range can be searched by using search box above the table. Start typing the species or clad name and the table will be updated automatically.
Fig. 26 Filtering. 1) Choose alignment. 2) Specify if you want filter instances based on conservation inside or outside species. 3) Search table (optional). 4) Select species/clad from table. 5) Click on Filter button, or Add button to save filter.

Accessions
There are two options to filter consensus matches based on provided accessions. The instances can be filtered by protein accessions or ontology and interaction identifiers.

Protein
UniProt protein accessions can be used to limit number of instances to these occurring in specific proteins.
Steps:
  1. Enter the UniProt protein accession separated by ENTER in the input box.
  2. Click on Filter button to view instances which occur in provided proteins, or Add button to save filter.

UniProt protein accessions are 6-10 alphanumerical stable identifiers. Examples of UniProt protein accessions in human:
UniProt accession Protein Name
P04637 Cellular tumor antigen p53
P11532 Dystrophin
Q8WZ42 Titin
Fig. 27 Filtering. 1) Enter UniProt protein accessions. 2) Click on Filter button, or Add button to save filter.


Annotations
Several annotations such as: Gene Ontology, UniProt Keywords, Pfam and UniProt accessions can be used to limit number of consensus matches in dataset. Gene Ontology and UniProt Keywords filter instances by function and localisation, and Pfam identifiers and UniProt accessions filter instances based on interaction data. Provided UniProt accession(s) indicate binding partner(s) i.e. interacting proteins, and Pfam id(s) describe interacting domain(s) i.e. domains occurring in interacting proteins.
Steps:
  1. Enter accessions separated by ENTER in the input box.
  2. Click on Filter button to view filtered instances assigned to provided annotations, or Add button to save filter.

Examples of accessions:
Source Accession Name
Gene Ontology GO:0007049 Cell cycle
UniProt keyword KW-0498 Mitosis
Pfam domain PF00017 SH2 domain
UniProt protein P04637 Cellular tumor antigen p53 (H.sapiens)
Fig. 28 Filtering. 1) Enter accessions. 2) Click on Filter button, or Add button to save filter.


Accessibility
Accessibility information can be used to limit number of instances to these which are accessible to intracellular proteins. Consensus matches are provided with warnings which indicate that instance is inaccessible to intracellular proteins. These warnings can be used to filter instances.
Steps:
  1. Switch on/off warning(s) from table. If warning is switched off then instances with that warning will be excluded from results.
  2. Click on Filter button to filter instances, or Add button to save filter.
Fig. 29 Filtering. 1) Switch on/off warning type. 3) Click on Filter button, or Add button to save filter.
The same filtering can be done on Instance page.



JobID


The user can retrieve previous searches using a unique identifier - JobID. JobID can be found on Instances page in the right top corner.
Fig. 30 JobID location.


References


SLiMSearch integrates data from numerous databases to annotate consensus matches with relevant features and functional annotations. Furthermore, tool uses several programs to compute motif discriminatory attributes.

Databases
Name Description PMID URL
UniProt Protein accessions, names, sequences, families, UniRef clusters and feature annotations. 25348405 http://www.uniprot.org
ELM Manually curated linear motifs. 26615199 http://elm.eu.org
Pfam Functional regions and binding domains. 24288371 http://pfam.xfam.org
Phospho.ELM Experimentally verified phosphorylation sites. 21062810 http://phospho.elm.eu.org
PhosphoSitePlus Phosphorylation, ubiquitination, acetylation and methylation sites. 22135298 http://www.phosphosite.org/homeAction.do
PDB Experimentally resolved protein tertiary structures. 10592235 http://www.rcsb.org/pdb/home/home.do
DSSP Secondary structure derived from PDB tertiary structures. 25352545 http://swift.cmbi.ru.nl/gv/dssp/
dbSNP Single-nucleotide polymorphism. NCBI Handbook [Internet]. Chapter 5. http://www.ncbi.nlm.nih.gov/SNP
1000genomes Single-nucleotide polymorphism. 23128226 http://www.1000genomes.org
switches.ELM Experimentally validated motif-based molecular switches. 23550212 http://switches.elm.eu.org
Gene Ontology Gene ontology annotations. 25428369 http://geneontology.org
IntAct Experimentally validated protein-protein interactions. 24234451 http://www.ebi.ac.uk/intact/

Programs
Name Description PMID URL
IUPred Intrinsically disordered regions. 15769473 http://iupred.enzim.hu
SLiMPrints Conservation of residues across the alignment. 22977176 http://bioware.ucd.ie
Anchor Binding sites in disordered regions. 19412530 http://anchor.enzim.hu

Additionally, each consensus match is linked to ProViz - visualisation tool, which graphically represents overlapping features and motif attributes.