Generic pipeline reference

Pipeline root properties

experiment_result

type: string

Metric to calculate against the combination of all stages and report in allStages section of summary.yaml file after all experiment instances are finished.

Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.

Metric name may have val_ prefix or _holdout postfix to indicate calculation against validation or holdout, respectively.

Example:

experiment_result: matthews_correlation_holdout

architecture

type: string

Name of the declaration that will be used as an entry point or root of the main network.

Example:

declarations: 
   utilityDeclaration1:
   utilityDeclaration2:
   mainNetwork:
       - utilityDeclaration1: []
       - dense: [1,"sigmoid"]

architecture: mainNetwork

batch

type: integer

Sets up training batch size.

Example:

batch: 512

callbacks

type: array of callback instances

Sets up training-time callbacks. See individual callback descriptions.

Example:

callbacks:
  EarlyStopping:
    patience: 100
    monitor: val_binary_accuracy
    verbose: 1
  ReduceLROnPlateau:
    patience: 16
    factor: 0.5
    monitor: val_binary_accuracy
    mode: auto
    cooldown: 5
    verbose: 1

copyWeights

type: boolean

Whether to copy saved weights.

Example:

copyWeights: true

clipnorm

type: float

Maximum clip norm of a gradient for an optimizer.

Example:

clipnorm: 1.0

clipvalue

type: float

Clip value of a gradient for an optimizer.

Example:

clipvalue: 0.5

dataset

type: complex object

Key is a name of the python function in scope, which returns training data set. Value is an array of parameters to pass to a function.

Example:

dataset:
  getTrain: [false,false]

datasets

type: map containing complex objects

Sets up a list of available data sets to be referred by other entities.

For each object, key is a name of the python function in scope, which returns training dataset. Value is an array of parameters to pass to a function.

Example:

datasets:
  test:
    getTest: [false,false]

declarations

type: complex

Sets up network layer building blocks.

Each declaration is an object with a key setting up declaration name and value being a complex object containing parameters array listing this layer parameters and body containing an array of sub-layers or control statements,

If layer has no parameters, parameters property may be ommitted and body contents may come directly inside layer definition.

See Layer types for details regarding building blocks.

Example:

declarations: 
   lstm2: 
      parameters: [count]
      body:
       - bidirectional:  
           - cuDNNLSTM: [count, true]
       - bidirectional:    
           - cuDNNLSTM: [count/2, false]
   net:
       - split-concat: 
          - word_indexes_embedding:  [ embeddings/glove.840B.300d.txt ]
          - word_indexes_embedding:  [ embeddings/paragram_300_sl999.txt ]
          - word_indexes_embedding:  [ embeddings/wiki-news-300d-1M.vec]
       - gaussianNoise: 0.05   
       - lstm2: [300]
       #- dropout: 0.5
       - dense: [1,"sigmoid"]

extra_train_data

type: string

Name of the additional dataset that will be added (per element) to the training dataset before train launching.

Example:


extra_train_data: more_people

folds_count

type: integer

Number of folds to train. Default is 5.

Example:


folds_count: 3

final_metrics

type: array of strings

Metrics to calculate against every stage and report in stages section of summary.yaml file after all experiment instances are finished.

Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.

Metric name may have val_ prefix or _holdout postfix to indicate calculation against validation or holdout, respectively.

Example:

final_metrics: [measure]

imports

type: array of strings

Imports python files from modules folder of the project and make their properly annotated contents to be available to be referred from YAML.

Example:

imports: [ layers, preprocessors ]

this will import layers.py and preprocessors.py

inference_batch

type: integer

Size of batch during inferring process.

Example:


loss

type: string

Sets the loss name.

Uses loss name detection mechanism to search for the built-in loss or for a custom function with the same name across project modules.

Example:

loss: binary_crossentropy

lr

type: float

Learning rate.

Example:


metrics

type: array of strings

Array of metrics to track during the training process. Metric calculation results will be printed in the console and to metrics folder of the experiment.

Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.

Metric name may have val_ prefix or _holdout postfix to indicate calculation against validation or holdout, respectively.

Example:

metrics: #We would like to track some metrics
  - binary_accuracy
  - binary_crossentropy
  - matthews_correlation

num_seeds

type: integer

If set, training process (for all folds) will be executed num_seeds times, each time resetting the random seeds. Respective folders (like metrics) will obtain subfolders 0, 1 etc... for each seed.

Example:


optimizer

type: string

Sets the optimizer.

Example:

optimizer: Adam

primary_metric

type: string

Metric to track during the training process. Metric calculation results will be printed in the console and to metrics folder of the experiment.

Besides tracking, this metric will be also used by default for metric-related activity, in example, for decision regarding which epoch results are better.

Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.

Metric name may have val_ prefix or _holdout postfix to indicate calculation against validation or holdout, respectively.

Example:

primary_metric: val_macro_f1

primary_metric_mode

type: enum: auto,min,max

default: auto

In case of a usage of a primary metrics calculation results across several instances (i.e. batches), this will be a mathematical operation to find a final result.

Example:

primary_metric_mode: max

preprocessing

type: complex

Preprocessors are the custom python functions that transform dataset.

Such functions should be defined in python files that are in a project scope (modules) folder and imported. Preprocessing functions should be also marked with @preprocessing.dataset_preprocessor annotation.

preprocessing instruction then can be used to chain preprocessors as needed for this particular experiment, and even cache the result on disk to be reused between experiments.

Preprocessors contain some of the preprocessor utility instructions.

Example:

preprocessing: 
  - binarize_target: 
  - tokenize:  
  - tokens_to_indexes:
       maxLen: 160
  - disk-cache: 

random_state

type: integer

The seed of randomness.

Example:


stages

type: complex

Sets up training process stages. Contains YAML array of stages, where each stage is a complex type that may contain properties described in the Stage properties section.

Example:

stages:
  - epochs: 6
  - epochs: 6
    lr: 0.01

stratified

type: boolean

Whether to use stratified strategy when splitting training set.

Example:


testSplit

type: float 0-1

Splits the train set into two parts, using one part for train and leaving the other untouched for a later testing. The split is shuffled.

Example:

testSplit: 0.4

testSplitSeed

type: ````

Seed of randomness for the split of the training set.

Example:


testTimeAugmentation

type: string

Test-time augumentation function name. Function must be reachable on project scope, accept and return numpy array.

Example:


validationSplit

type: float

Float 0-1 setting up how much of the training set (after holdout is already cut off) to allocate for validation. This property is only used if fold count is 1.

Example:


Callback types

EarlyStopping

Stop training when a monitored metric has stopped improving.

Properties:

  • patience - integer, number of epochs with no improvement after which training will be stopped.
  • verbose - 0 or 1, verbosity mode.
  • monitor - string, name of the metric to monitor
  • mode - auto, min or max; In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.

Example

callbacks:
  EarlyStopping:
    patience: 100
    monitor: val_binary_accuracy
    verbose: 1

ReduceLROnPlateau

Reduce learning rate when a metric has stopped improving.

Properties:

  • patience - integer, number of epochs with no improvement after which training will be stopped.
  • cooldown - integer, number of epochs to wait before resuming normal operation after lr has been reduced.
  • factor - number, factor by which the learning rate will be reduced. new_lr = lr * factor
  • verbose - 0 or 1, verbosity mode.
  • monitor - string, name of the metric to monitor
  • mode - auto, min or max; In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.

Example

callbacks:
  ReduceLROnPlateau:
    patience: 16
    factor: 0.5
    monitor: val_binary_accuracy
    mode: auto
    cooldown: 5
    verbose: 1

CyclicLR

Cycles learning rate across epochs.

Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function.

Properties:

  • base_lr - number, initial learning rate which is the lower boundary in the cycle.
  • max_lr - number, upper boundary in the cycle.
  • mode - one of triangular, triangular2 or exp_range; scaling function.
  • gamma - number from 0 to 1, constant in 'exp_range' scaling function.
  • step_size - integer > 0, number of training iterations (batches) per half cycle.

Example

callbacks:
  CyclicLR:
    base_lr: 0.001
    max_lr: 0.006
    step_size: 2000
    mode: triangular

LRVariator

Changes learning rate between two values

Properties:

  • fromVal - initial learning rate value, defaults to the configuration LR setup.
  • toVal - final learning value.
  • style - one of the following:
  • linear - changes LR linearly between two values.
  • const - does not change from initial value.
  • cos+ - -1 * cos(2x/pi) + 1 for x in [0;1]
  • cos- - cos(2x/pi) for x in [0;1]
  • cos - same as 'cos-'
  • sin+ - sin(2x/pi) x in [0;1]
  • sin- - -1 * sin(2x/pi) + 1 for x in [0;1]
  • sin - same as 'sin+'
  • any positive float or integer value - x^a for x in [0;1]
  • absSize : - size in batches
  • relSize : - size in fractions of epoch
  • periodEpochs : - period in epochs
  • periodSteps : - period in batches
  • then: - LRVariator that should manage learning rate after this

Example

  LRVariator: 
     fromVal: 0
     toVal: 0.00005 
     style: linear     
     relSize: 0.05 # lets go for 1/20 of epoch
     then:
         LRVariator:
             fromVal: 0.00005
             toVal: 0
             relSize: 2 # lets go for 2 of epochs                
             style: linear 

TensorBoard

This callback writes a log for TensorBoard, which allows you to visualize dynamic graphs of your training and test metrics, as well as activation histograms for the different layers in your model.

Properties:

  • log_dir - string; the path of the directory where to save the log files to be parsed by TensorBoard.
  • histogram_freq - integer; frequency (in epochs) at which to compute activation and weight histograms for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split) must be specified for histogram visualizations.
  • batch_size - integer; size of batch of inputs to feed to the network for histograms computation.
  • write_graph - boolean; whether to visualize the graph in TensorBoard. The log file can become quite large when write_graph is set to True.
  • write_grads - boolean; whether to visualize gradient histograms in TensorBoard. histogram_freq must be greater than 0.
  • write_images - boolean; whether to write model weights to visualize as image in TensorBoard.
  • embeddings_freq - number; frequency (in epochs) at which selected embedding layers will be saved. If set to 0, embeddings won't be computed. Data to be visualized in TensorBoard's Embedding tab must be passed as embeddings_data.
  • embeddings_layer_names - array of strings; a list of names of layers to keep eye on. If None or empty list all the embedding layer will be watched.
  • embeddings_metadata - a dictionary which maps layer name to a file name in which metadata for this embedding layer is saved. See the details about metadata files format. In case if the same metadata file is used for all embedding layers, string can be passed.
  • embeddings_data - data to be embedded at layers specified in embeddings_layer_names.
  • update_freq - epoch or batch or integer; When using 'batch', writes the losses and metrics to TensorBoard after each batch. The same applies for 'epoch'. If using an integer, let's say 10000, the callback will write the metrics and losses to TensorBoard every 10000 samples. Note that writing too frequently to TensorBoard can slow down your training.

Example

callbacks:
  TensorBoard:
    log_dir: './logs'
    batch_size: 32
    write_graph: True
    update_freq: batch

Layer types

Input

This layer is not intended to be used directly

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • shape - array of integers; input shape

Example:


GaussianNoise

Apply additive zero-centered Gaussian noise.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • stddev - float; standard deviation of the noise distribution.

Example:


Dropout

Applies Dropout to the input.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • rate - float; float between 0 and 1. Fraction of the input units to drop.
  • seed - integer; integer to use as random seed

Example:

declarations:
  net:
    - dropout: 0.5

SpatialDropout1D

Spatial 1D version of Dropout.

This version performs the same function as Dropout, however it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • rate - float between 0 and 1. Fraction of the input units to drop.

Example:


LSTM

Long Short-Term Memory layer

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • units: Positive integer, dimensionality of the output space.
  • activation: Activation function to use (see activations). Default: hyperbolic tangent (tanh). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
  • recurrent_activation: Activation function to use for the recurrent step (see activations). Default: hard sigmoid (hard_sigmoid). If you pass None, no activation is applied (ie. "linear" activation: a(x) = x).
  • use_bias: Boolean, whether the layer uses a bias vector.
  • kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs. (see initializers).
  • recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. (see initializers).
  • bias_initializer: Initializer for the bias vector (see initializers).
  • unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros". This is recommended in Jozefowicz et al. (2015).
  • kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
  • recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights matrix (see regularizer).
  • bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
  • activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
  • kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
  • recurrent_constraint: Constraint function applied to the recurrent_kernel weights matrix (see constraints).
  • bias_constraint: Constraint function applied to the bias vector (see constraints).
  • dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
  • recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
  • implementation: Implementation mode, either 1 or 2. Mode 1 will structure its operations as a larger number of smaller dot products and additions, whereas mode 2 will batch them into fewer, larger operations. These modes will have different performance profiles on different hardware and for different applications.
  • return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
  • return_state: Boolean. Whether to return the last state in addition to the output. The returned elements of the states list are the hidden state and the cell state, respectively.
  • go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
  • stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
  • unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.

Example:


GlobalMaxPool1D

Global max pooling operation for temporal data.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • data_format - A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).

Example:


GlobalAveragePooling1D

Global average pooling operation for temporal data.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • data_format - A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).

Example:


BatchNormalization

Batch normalization layer.

Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • axis: Integer, the axis that should be normalized (typically the features axis). For instance, after a Conv2D layer with data_format="channels_first", set axis=1 in BatchNormalization.
  • momentum: Momentum for the moving mean and the moving variance.
  • epsilon: Small float added to variance to avoid dividing by zero.
  • center: If True, add offset of beta to normalized tensor. If False, beta is ignored.
  • scale: If True, multiply by gamma. If False, gamma is not used. When the next layer is linear (also e.g. nn.relu), this can be disabled since the scaling will be done by the next layer.
  • beta_initializer: Initializer for the beta weight.
  • gamma_initializer: Initializer for the gamma weight.
  • moving_mean_initializer: Initializer for the moving mean.
  • moving_variance_initializer: Initializer for the moving variance.
  • beta_regularizer: Optional regularizer for the beta weight.
  • gamma_regularizer: Optional regularizer for the gamma weight.
  • beta_constraint: Optional constraint for the beta weight.
  • gamma_constraint: Optional constraint for the gamma weight.

Example:


Concatenate

Layer that concatenates a list of inputs.

Example:

- concatenate: [lstmBranch,textFeatureBranch]

Add

Layer that adds a list of inputs.

It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).

Example:

- add: [first,second]

Substract

ayer that subtracts two inputs.

It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape.

Example:

- substract: [first,second]

Mult

Layer that multiplies (element-wise) a list of inputs.

It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).

Example:

- mult: [first,second]

Max

Layer that computes the maximum (element-wise) a list of inputs.

It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).

Example:

- max: [first,second]

Min

Layer that computes the minimum (element-wise) a list of inputs.

It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).

Example:

- min: [first,second]

Conv1D

1D convolution layer (e.g. temporal convolution).

This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

When using this layer as the first layer in a model, provide an input_shape argument (tuple of integers or None, does not include the batch axis), e.g. input_shape=(10, 128) for time series sequences of 10 time steps with 128 features per step in data_format="channels_last", or (None, 128) for variable-length sequences with 128 features per step.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
  • kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
  • strides: An integer or tuple/list of a single integer, specifying the stride length of the convolution. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
  • padding: One of "valid", "causal" or "same" (case-insensitive). "valid" means "no padding". "same" results in padding the input such that the output has the same length as the original input. "causal" results in causal (dilated) convolutions, e.g. output[t] does not depend on input[t + 1:]. A zero padding is used such that the output has the same length as the original input. Useful when modeling temporal data where the model should not violate the temporal order. See WaveNet: A Generative Model for Raw Audio, section 2.1.
  • data_format: A string, one of "channels_last" (default) or "channels_first". The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, steps, channels) (default format for temporal data in Keras) while "channels_first" corresponds to inputs with shape (batch, channels, steps).
  • dilation_rate: an integer or tuple/list of a single integer, specifying the dilation rate to use for dilated convolution. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any strides value != 1.
  • activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
  • use_bias: Boolean, whether the layer uses a bias vector.
  • kernel_initializer: Initializer for the kernel weights matrix (see initializers).
  • bias_initializer: Initializer for the bias vector (see initializers).
  • kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
  • bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
  • activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
  • kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
  • bias_constraint: Constraint function applied to the bias vector (see constraints).

Example:


Conv2D

2D convolution layer (e.g. spatial convolution over images).

This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.

When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last".

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
  • kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
  • strides: An integer or tuple/list of 2 integers, specifying the strides of the convolution along the height and width. Can be a single integer to specify the same value for all spatial dimensions. Specifying any stride value != 1 is incompatible with specifying any dilation_rate value != 1.
  • padding: one of "valid" or "same" (case-insensitive). Note that "same" is slightly inconsistent across backends with strides != 1, as described here
  • data_format: A string, one of "channels_last" or "channels_first". The ordering of the dimensions in the inputs. "channels_last" corresponds to inputs with shape (batch, height, width, channels) while "channels_first" corresponds to inputs with shape (batch, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".
  • dilation_rate: an integer or tuple/list of 2 integers, specifying the dilation rate to use for dilated convolution. Can be a single integer to specify the same value for all spatial dimensions. Currently, specifying any dilation_rate value != 1 is incompatible with specifying any stride value != 1.
  • activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
  • use_bias: Boolean, whether the layer uses a bias vector.
  • kernel_initializer: Initializer for the kernel weights matrix (see initializers).
  • bias_initializer: Initializer for the bias vector (see initializers).
  • kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
  • bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
  • activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
  • kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
  • bias_constraint: Constraint function applied to the bias vector (see constraints).

Example:


MaxPool1D

Max pooling operation for temporal data.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • pool_size: Integer, size of the max pooling windows.
  • strides: Integer, or None. Factor by which to downscale. E.g. 2 will halve the input. If None, it will default to pool_size.
  • padding: One of "valid" or "same" (case-insensitive).
  • data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).

Example:


MaxPool2D

Max pooling operation for spatial data.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
  • strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.
  • padding: One of "valid" or "same" (case-insensitive).
  • data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, height, width, channels) while channels_first corresponds to inputs with shape (batch, channels, height, width). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".

Example:


AveragePooling1D

Average pooling for temporal data.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • pool_size: Integer, size of the average pooling windows.
  • strides: Integer, or None. Factor by which to downscale. E.g. 2 will halve the input. If None, it will default to pool_size.
  • padding: One of "valid" or "same" (case-insensitive).
  • data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).

Example:


CuDNNLSTM

Fast LSTM implementation with CuDNN.

Can only be run on GPU, with the TensorFlow backend.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • units: Positive integer, dimensionality of the output space.
  • kernel_initializer: Initializer for the kernel weights matrix, used for the linear transformation of the inputs. (see initializers).
  • recurrent_initializer: Initializer for the recurrent_kernel weights matrix, used for the linear transformation of the recurrent state. (see initializers).
  • bias_initializer: Initializer for the bias vector (see initializers).
  • unit_forget_bias: Boolean. If True, add 1 to the bias of the forget gate at initialization. Setting it to true will also force bias_initializer="zeros". This is recommended in Jozefowicz et al. (2015).
  • kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
  • recurrent_regularizer: Regularizer function applied to the recurrent_kernel weights matrix (see regularizer).
  • bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
  • activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
  • kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
  • recurrent_constraint: Constraint function applied to the recurrent_kernel weights matrix (see constraints).
  • bias_constraint: Constraint function applied to the bias vector (see constraints).
  • return_sequences: Boolean. Whether to return the last output. in the output sequence, or the full sequence.
  • return_state: Boolean. Whether to return the last state in addition to the output.
  • stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.

Example:


Dense

Regular densely-connected NN layer.

Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).

Note: if the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • units: Positive integer, dimensionality of the output space.
  • activation: Activation function to use (see activations). If you don't specify anything, no activation is applied (ie. "linear" activation: a(x) = x).
  • use_bias: Boolean, whether the layer uses a bias vector.
  • kernel_initializer: Initializer for the kernel weights matrix (see initializers).
  • bias_initializer: Initializer for the bias vector (see initializers).
  • kernel_regularizer: Regularizer function applied to the kernel weights matrix (see regularizer).
  • bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
  • activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
  • kernel_constraint: Constraint function applied to the kernel weights matrix (see constraints).
  • bias_constraint: Constraint function applied to the bias vector (see constraints).

Example:


Flatten

Flattens the input. Does not affect the batch size.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • data_format: A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. The purpose of this argument is to preserve weight ordering when switching a model from one data format to another. channels_last corresponds to inputs with shape (batch, ..., channels) while channels_first corresponds to inputs with shape (batch, channels, ...). It defaults to the image_data_format value found in your Keras config file at ~/.keras/keras.json. If you never set it, then it will be "channels_last".

Example:


Bidirectional

Bidirectional wrapper for RNNs.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • layer: Recurrent instance.
  • merge_mode: Mode by which outputs of the forward and backward RNNs will be combined. One of {'sum', 'mul', 'concat', 'ave', None}. If None, the outputs will not be combined, they will be returned as a list.
  • weights: Initial weights to load in the Bidirectional model

Example:


Utility layers

split

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Number of outputs is equal to a number of children.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-concat

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a concatenation of child flows.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:

- split-concat:
         - word_indexes_embedding:  [ embeddings/glove.840B.300d.txt ]
         - word_indexes_embedding:  [ embeddings/paragram_300_sl999.txt ]
         - word_indexes_embedding:  [ embeddings/wiki-news-300d-1M.vec]
- lstm2: [128]

split-concatenate

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a concatenation of child flows (equal to the usage of Concatenate layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:

- split-concat:
         - word_indexes_embedding:  [ embeddings/glove.840B.300d.txt ]
         - word_indexes_embedding:  [ embeddings/paragram_300_sl999.txt ]
         - word_indexes_embedding:  [ embeddings/wiki-news-300d-1M.vec]
- lstm2: [128]

split-add

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is an addition of child flows (equal to the usage of Add layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-substract

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a substraction of child flows (equal to the usage of Substract layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-mult

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a multiplication of child flows (equal to the usage of Mult layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-min

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a minimum of child flows (equal to the usage of Min layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-max

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a maximum of child flows (equal to the usage of Max layer).

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • **** -

Example:


split-dot

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a dot product of child flows.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


split-dot-normalize

Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.

Output is a dot product with normalization of child flows.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


seq

Executes child elements as a sequence of operations, one by one.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


input

Overrides current input with what is listed.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:

input: [firstRef, secondRef]

pass

Forwards data from this branch

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:


transform-concat:
  - pass
  - Conv1D: [10,1,"relu"]

transform-concat

passes input tensors through layers, and then concatenates outputs

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • **** -

Example:

transform-concat:
  - Conv1D: [10,1,"relu"]
  - Conv1D: [10,2,"relu"]

transform-add

passes input tensors through layers, and then adds outputs

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.
  • **** -

Example:

transform-add:
  - Conv1D: [10,1,"relu"]
  - Conv1D: [10,2,"relu"]

Stage properties

callbacks

type: array of callback instances

Sets up training-time callbacks. See individual callback descriptions.

Example:

callbacks:
  EarlyStopping:
    patience: 100
    monitor: val_binary_accuracy
    verbose: 1
  ReduceLROnPlateau:
    patience: 16
    factor: 0.5
    monitor: val_binary_accuracy
    mode: auto
    cooldown: 5
    verbose: 1

epochs

type: integer

Number of epochs to train for this stage.

Example:


extra_callbacks

Allows to specify a list of additional callbacks that should be applied to this stage

initial_weights

type: string

Fil path to load stage NN initial weights from.

Example:

initial_weights: /initial.weights

negatives

type: string or integer

The support of binary data balancing for training set.

Following values are acceptable:

  • none - exclude negative examples from the data
  • real - include all negative examples
  • integer number(1 or 2 or anything), how many negative examples should be included per one positive example

In order for the system to determine whether a particular example is positive or negative, the data set class defined by the dataset property should have isPositive method declared that accepts data set item and returns boolean.

Example:

stages:
  - epochs: 6 #Train for 6 epochs
    negatives: none #do not include negative examples in your training set 
    validation_negatives: real #validation should contain all negative examples    

  - lr: 0.0001 #let's use different starting learning rate
    epochs: 6
    negatives: real
    validation_negatives: real

  - loss: lovasz_loss #let's override loss function
    lr: 0.00001
    epochs: 6
    initial_weights: ./fpn-resnext2/weights/best-0.1.weights #let's load weights from this file    

loss

type: string

Sets the loss name.

Uses loss name detection mechanism to search for the built-in loss or for a custom function with the same name across project modules.

Example:

loss: binary_crossentropy

lr

type: float

Learning rate.

Example:

lr: 0.01

validation_negatives

type: string or integer

The support of binary data balancing for validation set.

Following values are acceptable:

  • none - exclude negative examples from the data
  • real - include all negative examples
  • integer number(1 or 2 or anything), how many negative examples should be included per one positive example

In order for the system to determine whether a particular example is positive or negative, the data set class defined by the dataset property should have isPositive method declared that accepts data set item and returns boolean.

Example:

stages:
  - epochs: 6 #Train for 6 epochs
    negatives: none #do not include negative examples in your training set 
    validation_negatives: real #validation should contain all negative examples    

  - lr: 0.0001 #let's use different starting learning rate
    epochs: 6
    negatives: real
    validation_negatives: real

  - loss: lovasz_loss #let's override loss function
    lr: 0.00001
    epochs: 6
    initial_weights: ./fpn-resnext2/weights/best-0.1.weights #let's load weights from this file    

Preprocessors

type: complex

Preprocessors are the custom python functions that transform dataset.

Such functions should be defined in python files that are in a project scope (modules) folder and imported. Preprocessing functions should be also marked with @preprocessing.dataset_preprocessor annotation.

Preprocessors instruction then can be used to chain preprocessors as needed for this particular experiment, and even cache the result on disk to be reused between experiments.

Example:

preprocessing: 
  - binarize_target: 
  - tokenize:  
  - tokens_to_indexes:
       maxLen: 160
  - disk-cache: 

cache

Caches its input.

Caches its input in memory, including the full flow.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:

preprocessing: 
  - binarize_target: 
  - tokenize:  
  - tokens_to_indexes:
       maxLen: 160
  - cache:

disk-cache

Caches its input on disk, including the full flow. On subsequent launches if nothing was changed in the flow, takes its output from disk instead of re-launching previous operations.

Properties:

  • name - string; optionally sets up layer name to refer it from other layers.
  • inputs - array of strings; lists layer inputs.

Example:

preprocessing: 
  - binarize_target: 
  - tokenize:  
  - tokens_to_indexes:
       maxLen: 160
  - disk-cache: 

split-preprocessor

An analogue of split for preprocessor operations.

Example:


split-concat-preprocessor

An analogue of split-concat for preprocessor operations.

Example:


seq-preprocessor

An analogue of seq for preprocessor operations.

Example:


augmentation

Preprocessor instruction, which body only runs during the training and is skipped when the inferring.

Example:


fit script arguments

fit.py project

type: string

Folder to search for experiments, project root.

Example:

-m musket_core.fit --project "path/to/project"

fit.py name

type: string or comma-separated list of strings

Name of the experiment to launch, or a list of names.

Example:

-m musket_core.fit --name "experiment_name"

-m musket_core.fit --name "experiment_name1, experiment_name2"

fit.py num_gpus

type: integer

Default: 1

Number of GPUs to use during experiment launch.

Example: -m musket_core.fit --num_gpus=1

fit.py gpus_per_net

type: integer

Default: 1

Maximum number of GPUs to use per single experiment.

Example: -m musket_core.fit --gpus_per_net=1

fit.py num_workers

type: integer

Default: 1

Number of workers to use.

Example: -m musket_core.fit --num_workers=1

fit.py allow_resume

type: boolean

Default: False

Whether to allow resuming of experiments, which will cause unfinished experiments to start from the best saved weights.

Example: -m musket_core.fit --allow_resume True

fit.py force_recalc

type: boolean

Default: False

Whether to force rebuilding of reports and predictions.

Example: -m musket_core.fit --force_recalc True

fit.py launch_tasks

type: boolean

Default: False

Whether to launch associated tasks.

Example: -m musket_core.fit --launch_tasks True

fit.py only_report

type: boolean

Default: False

Whether to only generate reports for cached data, no training occurs.

Example: -m musket_core.fit --only_report True

fit.py cache

type: string

Path to the cache folder. Cache folder will contain temporary cached data for executed experiments.

Example: -m musket_core.fit --cache "path/to/cache/folder"

fit.py folds

type: integer or comma-separated list of integers

Folds to launch. By default all folds of experiment will be executed, this argument allows launching only some of them.

Example: -m musket_core.fit --folds 1,2

task script arguments

task.py project

type: string

Folder to search for experiments, project root.

Example:

task.py --project "path/to/project"

task.py name

type: string or comma-separated list of strings

Name of the experiment to launch, or a list of names.

Example:

task.py --name "experiment_name"

task.py --name "experiment_name1, experiment_name2"

task.py task

type: string or comma-separated list of strings

Default: all tasks.

Name of the task to launch, or a list of names.

Example:

task.py --task "task_name"

task.py --task "task_name1, task_name2"

task.py --task "all"

task.py num_gpus

type: integer

Default: 1

Number of GPUs to use during experiment launch.

Example: task.py --num_gpus=1

task.py gpus_per_net

type: integer

Default: 1

Maximum number of GPUs to use per single experiment.

Example: task.py --gpus_per_net=1

task.py num_workers

type: integer

Default: 1

Number of workers to use.

Example: task.py --num_workers=1

task.py allow_resume

type: boolean

Default: False

Whether to allow resuming of experiments, which will cause unfinished experiments to start from the best saved weights.

Example: task.py --allow_resume True

task.py force_recalc

type: boolean

Default: False

Whether to force rebuilding of reports and predictions.

Example: task.py --force_recalc True

task.py launch_tasks

type: boolean

Default: False

Whether to launch associated tasks.

Example: task.py --launch_tasks True

task.py cache

type: string

Path to the cache folder. Cache folder will contain temporary cached data for executed experiments.

Example: task.py --cache "path/to/cache/folder"

analyze script arguments

analyze.py inputFolder

type: string

Folder to search for finished experiments in. Typically, project root.

Example:

analyze.py --inputFolder "path/to/project"

analyze.py output

type: string

Default: report.csv in project root.

Output report file path.

Example:

analyze.py --output "path/to/project/report/report.scv"

analyze.py onlyMetric

type: string

Name of the single metric to take into account.

Example:

analyze.py --onlyMetric "metric_name"

analyze.py sortBy

type: string

Name of the metric to sort result by.

Example:

analyze.py --sortBy "metric_name"