Sequence Collections are a specific subtype of the general
Collection data type that can only contain
Sequence objects.
Sequence Collections are mainly used to limit the application of
operations and
analyses to subsets of sequences, but they can also be used to import sequence definitions from a file.
All Sequence Collections will appear in the "Data Objects" panel in MotifLab's GUI.
You can create new Sequence Collections by pressing the "+" button in this panel and then selecting "Sequence Collection" from the appearing menu, or you can go to the "Data" menu in the top menu bar and select "Add New ⇒ Sequence Collection".
MotifLab has a special sequence collection called "AllSequences" which is regarded as the
default sequence collection. This collection is always present and it cannot be created or deleted
(however, it will be hidden from the "Data Objects" panel in the GUI when it is empty). Neither can its contents be manipulated directly. The "AllSequences" collection will always contain all the currently defined sequences.
Newly created sequences will automatically be added to "AllSequences" and deleted sequences will be removed from "AllSequences". Many operations and analyses require you to specify a sequence collection to apply the operation to,
but if none are provided, the "AllSequences" collection will normally be assumed and the operation/analysis will thus be applied to all sequences.
Note that similar default collections do
not exist for
motifs and
modules (i.e. there are no "AllMotifs" and "AllModules" collections).
Hence, if you want to perform motif scanning with all currently defined motifs you must explicitly create a new Motif Collection containing all motifs and then refer to that collection in the motif scanning tool.
Although collections are normally regarded as
unordered sets they are actually implemented as
ordered lists even though the order is almost never an issue.
Actually, the "AllSequences" collection is the only collection where the order is used for something in MotifLab, and it is also the only collection whose order can be manipulated.
When
outputting sequences they are normally output in the order they have in AllSequences, and this is also the order used to visualize the sequences in the GUI.
You can reorder sequences in the GUI (and hence in AllSequences) by pointing at a sequence to give it focus and then use the CONTROL key plus ARROW UP or DOWN to move it, or right-click on a sequence and select "Reorder Sequences" from the context menu.
Sorting sequences (either with the sort tool or the sort
display setting) will also reorder sequences in the AllSequences collection.
As described in the general section on
collections, Sequence Collections can be created by explicitly listing the names of sequences to include, using condititions to select sequences based on property values
or values in Numeric Maps, or importing sequence collections from files. In addition, Sequence Collections can also be based on sequence
statistics as described below.
Collections based on sequence statistics
The
statistic operation can be applied to
feature tracks to calculate various statistics, such as counting the number of regions in each sequence,
finding the largest value for each sequence in a numeric track or counting the number of A's for each sequence in a DNA track. This operation returns a
Sequence Numeric Map with values for each individual sequence.
As described in the general section on
collections, collections can be created based on their values in
Numeric Maps, so you can use the
statistic operation
to create a map and then create a collection from this map. However, it is also possible to perform this in one step and create a Sequence Collection directly from sequence statistics. (Actually, MotifLab will run the statistic operation in the background
to create a map, and then use this to create the collection, but this is done automatically and the map is discarded afterwards).
In the GUI you can create such collections by selecting "Add New ⇒ Sequence Collection" from the "Data" menu and then go to the "From Statistic" tab. Press the "Select" button to define the statistic function
(this will actually bring up the same dialog that is displayed for the
statistic operation) and use the other menus to select the comparator function and target value(s).
The general syntax for creating sequence collections based on statistics in protocols is the following:
MyCollection = new Sequence Collection(Statistic: (<statistic function>) <comparator> <target value>)
Examples
# Creates a Sequence collection containing all sequences with more than 20 regions in the TFBS track
# This approach uses the statistic operation to create a map and then uses the map to create the collection
TFBS_count_map = statistic "region count" in TFBS
Collection1 = new Sequence Collection(Map:TFBS_count_map > 20)
# Same as above but this time using the statistic constructor directly in the collection
Collection2 = new Sequence Collection(Statistic:("region count" in TFBS) > 20)