DataFormat

BED

Applies to:  Region Dataset

Description

The BED format consists of one line with TAB separated fields per region in a Region Dataset. The first three fields are required but additional fields can also be specified. MotifLab assumes that files are in a BED-6 format, but it is also possible to use other non-standard formats. The fields of the default BED-6 format are in order:
  1. Chromosome: The name of the chromosome or scaffold.
         (Chromosome names can be given with or without the "chr" prefix)
  2. Chromosome start: Start position of the region in chromosomal coordinates.
  3. Chromosome end: End position of the region in chromosomal coordinates.
  4. Name: The type name of the region.
  5. Score: The score for the region.
  6. Strand: The strand orientation of the region.
         This can be either "+" (direct strand) or "-" (reverse strand), or even "." if the orientation is undetermined.
Note that the default coordinate system employed by the BED format defines the first position of a chromosome to be position 0 (rather than position 1 which is commonly used by other formats), and the end-coordinate of a region is exclusive (i.e. the coordinate is actually the first position after the region).

Example:
chr10   75427001   75427008   M00101   4.9968641726528125   -
chr10   75427002   75427007   M00028   4.686097666202365    +
chr10   75427002   75427007   M00029   4.486802949517       +
chr10   75427003   75427014   M00472   8.447923342601406    -
chr17    8474690    8474701   M00073   7.7850394311299675   +
chr17    8474710    8474718   M00428   6.149151076269675    +
chr17    8474719    8474730   M00507   8.998892822877837    -

Arguments

NameDescription
Add CHR prefix If selected, the prefix "chr" will be added before the chromosome number (e.g. chromosome 12 will be output as "chr12" rather than just "12").
Start position Selects the position of the first base in a chromosome. The default is 0 for BED, but it is usually 1 for other data formats so MotifLab allows the start position to be specified as either 0 or 1.
Coordinate system Selects whether the coordinates in the BED-file are in the standard BED-coordinate system (with the chromosome starting at position 0 and end-coordinates being exclusive) or in the format used by e.g. GFF, where the chromosome starts at position 1 and both start- and end-coordinates are inclusive. This parameter replaces the "Start position" parameter that was used in earlier MotifLab versions.
Format This optional parameter can be used to explicitly define the contents of each line in the BED-file if a non-standard format is used. The format should be defined as a comma-separated list of column names. The default format assumes the BED-file contains the following six columns: "CHROMOSOME, START, END, TYPE, SCORE, STRAND". A line is allowed to contain fewer columns than the format specifies, in which case the missing columns are simply ignored. If the file contains additional columns that the user wants to import, the names of these columns must be included in the format specification. For example, if the file has an additional column containing the property "GeneID" after the STRAND column, the format parameter should be set to "CHROMOSOME, START, END, TYPE, SCORE, STRAND, GeneID". It is also possible to skip columns by replacing the column name with an asterisk (*). For example, if a non-standard BED-file contains the columns "CHROMOSOME, START, END, TYPE, SCORE, GeneID", and a user wants to import all of these properties except for the SCORE property, the format parameter can be set to "CHROMOSOME, START, END, TYPE, *, GeneID" (with SCORE replaced by * in the format definition).

See Also: output, Region Dataset