Generic pipeline reference
Pipeline root properties
experiment_result
type: string
Metric to calculate against the combination of all stages and report in allStages
section of summary.yaml file after all experiment instances are finished.
Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.
Metric name may have val_
prefix or _holdout
postfix to indicate calculation against validation or holdout, respectively.
Example:
experiment_result: matthews_correlation_holdout
architecture
type: string
Name of the declaration that will be used as an entry point or root of the main network.
Example:
declarations:
utilityDeclaration1:
utilityDeclaration2:
mainNetwork:
- utilityDeclaration1: []
- dense: [1,"sigmoid"]
architecture: mainNetwork
batch
type: integer
Sets up training batch size.
Example:
batch: 512
callbacks
type: array of callback instances
Sets up training-time callbacks. See individual callback descriptions.
Example:
callbacks:
EarlyStopping:
patience: 100
monitor: val_binary_accuracy
verbose: 1
ReduceLROnPlateau:
patience: 16
factor: 0.5
monitor: val_binary_accuracy
mode: auto
cooldown: 5
verbose: 1
copyWeights
type: boolean
Whether to copy saved weights.
Example:
copyWeights: true
clipnorm
type: float
Maximum clip norm of a gradient for an optimizer.
Example:
clipnorm: 1.0
clipvalue
type: float
Clip value of a gradient for an optimizer.
Example:
clipvalue: 0.5
dataset
type: complex object
Key is a name of the python function in scope, which returns training data set. Value is an array of parameters to pass to a function.
Example:
dataset:
getTrain: [false,false]
datasets
type: map containing complex objects
Sets up a list of available data sets to be referred by other entities.
For each object, key is a name of the python function in scope, which returns training dataset. Value is an array of parameters to pass to a function.
Example:
datasets:
test:
getTest: [false,false]
declarations
type: complex
Sets up network layer building blocks.
Each declaration is an object with a key setting up declaration name
and value being a complex object containing parameters
array listing
this layer parameters and body
containing an array of sub-layers or control statements,
If layer has no parameters, parameters
property may be ommitted and body
contents may
come directly inside layer definition.
See Layer types for details regarding building blocks.
Example:
declarations:
lstm2:
parameters: [count]
body:
- bidirectional:
- cuDNNLSTM: [count, true]
- bidirectional:
- cuDNNLSTM: [count/2, false]
net:
- split-concat:
- word_indexes_embedding: [ embeddings/glove.840B.300d.txt ]
- word_indexes_embedding: [ embeddings/paragram_300_sl999.txt ]
- word_indexes_embedding: [ embeddings/wiki-news-300d-1M.vec]
- gaussianNoise: 0.05
- lstm2: [300]
#- dropout: 0.5
- dense: [1,"sigmoid"]
extra_train_data
type: string
Name of the additional dataset that will be added (per element) to the training dataset before train launching.
Example:
extra_train_data: more_people
folds_count
type: integer
Number of folds to train. Default is 5.
Example:
folds_count: 3
final_metrics
type: array of strings
Metrics to calculate against every stage and report in stages
section of summary.yaml file after all experiment instances are finished.
Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.
Metric name may have val_
prefix or _holdout
postfix to indicate calculation against validation or holdout, respectively.
Example:
final_metrics: [measure]
imports
type: array of strings
Imports python files from modules
folder of the project and make their properly annotated contents to be available to be referred from YAML.
Example:
imports: [ layers, preprocessors ]
this will import layers.py
and preprocessors.py
inference_batch
type: integer
Size of batch during inferring process.
Example:
loss
type: string
Sets the loss name.
Uses loss name detection mechanism to search for the built-in loss or for a custom function with the same name across project modules.
Example:
loss: binary_crossentropy
lr
type: float
Learning rate.
Example:
metrics
type: array of strings
Array of metrics to track during the training process. Metric calculation results will be printed in the console and to metrics
folder of the experiment.
Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.
Metric name may have val_
prefix or _holdout
postfix to indicate calculation against validation or holdout, respectively.
Example:
metrics: #We would like to track some metrics
- binary_accuracy
- binary_crossentropy
- matthews_correlation
num_seeds
type: integer
If set, training process (for all folds) will be executed num_seeds
times, each time resetting the random seeds.
Respective folders (like metrics
) will obtain subfolders 0
, 1
etc... for each seed.
Example:
optimizer
type: string
Sets the optimizer.
Example:
optimizer: Adam
primary_metric
type: string
Metric to track during the training process. Metric calculation results will be printed in the console and to metrics
folder of the experiment.
Besides tracking, this metric will be also used by default for metric-related activity, in example, for decision regarding which epoch results are better.
Uses metric name detection mechanism to search for the built-in metric or for a custom function with the same name across project modules.
Metric name may have val_
prefix or _holdout
postfix to indicate calculation against validation or holdout, respectively.
Example:
primary_metric: val_macro_f1
primary_metric_mode
type: enum: auto,min,max
default: auto
In case of a usage of a primary metrics calculation results across several instances (i.e. batches), this will be a mathematical operation to find a final result.
Example:
primary_metric_mode: max
preprocessing
type: complex
Preprocessors are the custom python functions that transform dataset.
Such functions should be defined in python files that are in a project scope (modules
) folder and imported.
Preprocessing functions should be also marked with @preprocessing.dataset_preprocessor
annotation.
preprocessing
instruction then can be used to chain preprocessors as needed for this particular experiment, and even cache the result on disk to be reused between experiments.
Preprocessors contain some of the preprocessor utility instructions.
Example:
preprocessing:
- binarize_target:
- tokenize:
- tokens_to_indexes:
maxLen: 160
- disk-cache:
random_state
type: integer
The seed of randomness.
Example:
stages
type: complex
Sets up training process stages. Contains YAML array of stages, where each stage is a complex type that may contain properties described in the Stage properties section.
Example:
stages:
- epochs: 6
- epochs: 6
lr: 0.01
stratified
type: boolean
Whether to use stratified strategy when splitting training set.
Example:
testSplit
type: float 0-1
Splits the train set into two parts, using one part for train and leaving the other untouched for a later testing. The split is shuffled.
Example:
testSplit: 0.4
testSplitSeed
type: ````
Seed of randomness for the split of the training set.
Example:
testTimeAugmentation
type: string
Test-time augumentation function name. Function must be reachable on project scope, accept and return numpy array.
Example:
validationSplit
type: float
Float 0-1 setting up how much of the training set (after holdout is already cut off) to allocate for validation. This property is only used if fold count is 1.
Example:
Callback types
EarlyStopping
Stop training when a monitored metric has stopped improving.
Properties:
- patience - integer, number of epochs with no improvement after which training will be stopped.
- verbose - 0 or 1, verbosity mode.
- monitor - string, name of the metric to monitor
- mode - auto, min or max; In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.
Example
callbacks:
EarlyStopping:
patience: 100
monitor: val_binary_accuracy
verbose: 1
ReduceLROnPlateau
Reduce learning rate when a metric has stopped improving.
Properties:
- patience - integer, number of epochs with no improvement after which training will be stopped.
- cooldown - integer, number of epochs to wait before resuming normal operation after lr has been reduced.
- factor - number, factor by which the learning rate will be reduced. new_lr = lr * factor
- verbose - 0 or 1, verbosity mode.
- monitor - string, name of the metric to monitor
- mode - auto, min or max; In min mode, training will stop when the quantity monitored has stopped decreasing; in max mode it will stop when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.
Example
callbacks:
ReduceLROnPlateau:
patience: 16
factor: 0.5
monitor: val_binary_accuracy
mode: auto
cooldown: 5
verbose: 1
CyclicLR
Cycles learning rate across epochs.
Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function.
Properties:
- base_lr - number, initial learning rate which is the lower boundary in the cycle.
- max_lr - number, upper boundary in the cycle.
- mode - one of
triangular
,triangular2
orexp_range
; scaling function. - gamma - number from 0 to 1, constant in 'exp_range' scaling function.
- step_size - integer > 0, number of training iterations (batches) per half cycle.
Example
callbacks:
CyclicLR:
base_lr: 0.001
max_lr: 0.006
step_size: 2000
mode: triangular
LRVariator
Changes learning rate between two values
Properties:
- fromVal - initial learning rate value, defaults to the configuration LR setup.
- toVal - final learning value.
- style - one of the following:
- linear - changes LR linearly between two values.
- const - does not change from initial value.
- cos+ -
-1 * cos(2x/pi) + 1 for x in [0;1]
- cos- -
cos(2x/pi) for x in [0;1]
- cos - same as 'cos-'
- sin+ -
sin(2x/pi) x in [0;1]
- sin- -
-1 * sin(2x/pi) + 1 for x in [0;1]
- sin - same as 'sin+'
- any positive float or integer value - x^a for x in [0;1]
- absSize : - size in batches
- relSize : - size in fractions of epoch
- periodEpochs : - period in epochs
- periodSteps : - period in batches
- then: - LRVariator that should manage learning rate after this
Example
LRVariator:
fromVal: 0
toVal: 0.00005
style: linear
relSize: 0.05 # lets go for 1/20 of epoch
then:
LRVariator:
fromVal: 0.00005
toVal: 0
relSize: 2 # lets go for 2 of epochs
style: linear
TensorBoard
This callback writes a log for TensorBoard, which allows you to visualize dynamic graphs of your training and test metrics, as well as activation histograms for the different layers in your model.
Properties:
- log_dir - string; the path of the directory where to save the log files to be parsed by TensorBoard.
- histogram_freq - integer; frequency (in epochs) at which to compute activation and weight histograms for the layers of the model. If set to 0, histograms won't be computed. Validation data (or split) must be specified for histogram visualizations.
- batch_size - integer; size of batch of inputs to feed to the network for histograms computation.
- write_graph - boolean; whether to visualize the graph in TensorBoard. The log file can become quite large when write_graph is set to True.
- write_grads - boolean; whether to visualize gradient histograms in TensorBoard. histogram_freq must be greater than 0.
- write_images - boolean; whether to write model weights to visualize as image in TensorBoard.
- embeddings_freq - number; frequency (in epochs) at which selected embedding layers will be saved. If set to 0, embeddings won't be computed. Data to be visualized in TensorBoard's Embedding tab must be passed as embeddings_data.
- embeddings_layer_names - array of strings; a list of names of layers to keep eye on. If None or empty list all the embedding layer will be watched.
- embeddings_metadata - a dictionary which maps layer name to a file name in which metadata for this embedding layer is saved. See the details about metadata files format. In case if the same metadata file is used for all embedding layers, string can be passed.
- embeddings_data - data to be embedded at layers specified in embeddings_layer_names.
- update_freq -
epoch
orbatch
or integer; When using 'batch', writes the losses and metrics to TensorBoard after each batch. The same applies for 'epoch'. If using an integer, let's say 10000, the callback will write the metrics and losses to TensorBoard every 10000 samples. Note that writing too frequently to TensorBoard can slow down your training.
Example
callbacks:
TensorBoard:
log_dir: './logs'
batch_size: 32
write_graph: True
update_freq: batch
Layer types
Input
This layer is not intended to be used directly
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- shape - array of integers; input shape
Example:
GaussianNoise
Apply additive zero-centered Gaussian noise.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- stddev - float; standard deviation of the noise distribution.
Example:
Dropout
Applies Dropout to the input.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- rate - float; float between 0 and 1. Fraction of the input units to drop.
- seed - integer; integer to use as random seed
Example:
declarations:
net:
- dropout: 0.5
SpatialDropout1D
Spatial 1D version of Dropout.
This version performs the same function as Dropout, however it drops entire 1D feature maps instead of individual elements. If adjacent frames within feature maps are strongly correlated (as is normally the case in early convolution layers) then regular dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, SpatialDropout1D will help promote independence between feature maps and should be used instead.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- rate - float between 0 and 1. Fraction of the input units to drop.
Example:
LSTM
Long Short-Term Memory layer
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use
(see activations).
Default: hyperbolic tangent (
tanh
). If you passNone
, no activation is applied (ie. "linear" activation:a(x) = x
). - recurrent_activation: Activation function to use
for the recurrent step
(see activations).
Default: hard sigmoid (
hard_sigmoid
). If you passNone
, no activation is applied (ie. "linear" activation:a(x) = x
). - use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs. (see initializers). - recurrent_initializer: Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state. (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- unit_forget_bias: Boolean.
If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also force
bias_initializer="zeros"
. This is recommended in Jozefowicz et al. (2015). - kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - recurrent_regularizer: Regularizer function applied to
the
recurrent_kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - recurrent_constraint: Constraint function applied to
the
recurrent_kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
- dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the inputs.
- recurrent_dropout: Float between 0 and 1. Fraction of the units to drop for the linear transformation of the recurrent state.
- implementation: Implementation mode, either 1 or 2. Mode 1 will structure its operations as a larger number of smaller dot products and additions, whereas mode 2 will batch them into fewer, larger operations. These modes will have different performance profiles on different hardware and for different applications.
- return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.
- return_state: Boolean. Whether to return the last state in addition to the output. The returned elements of the states list are the hidden state and the cell state, respectively.
- go_backwards: Boolean (default False). If True, process the input sequence backwards and return the reversed sequence.
- stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
- unroll: Boolean (default False). If True, the network will be unrolled, else a symbolic loop will be used. Unrolling can speed-up a RNN, although it tends to be more memory-intensive. Unrolling is only suitable for short sequences.
Example:
GlobalMaxPool1D
Global max pooling operation for temporal data.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- data_format - A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).
Example:
GlobalAveragePooling1D
Global average pooling operation for temporal data.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- data_format - A string, one of channels_last (default) or channels_first. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with shape (batch, steps, features) while channels_first corresponds to inputs with shape (batch, features, steps).
Example:
BatchNormalization
Batch normalization layer.
Normalize the activations of the previous layer at each batch, i.e. applies a transformation that maintains the mean activation close to 0 and the activation standard deviation close to 1.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- axis: Integer, the axis that should be normalized
(typically the features axis).
For instance, after a
Conv2D
layer withdata_format="channels_first"
, setaxis=1
inBatchNormalization
. - momentum: Momentum for the moving mean and the moving variance.
- epsilon: Small float added to variance to avoid dividing by zero.
- center: If True, add offset of
beta
to normalized tensor. If False,beta
is ignored. - scale: If True, multiply by
gamma
. If False,gamma
is not used. When the next layer is linear (also e.g.nn.relu
), this can be disabled since the scaling will be done by the next layer. - beta_initializer: Initializer for the beta weight.
- gamma_initializer: Initializer for the gamma weight.
- moving_mean_initializer: Initializer for the moving mean.
- moving_variance_initializer: Initializer for the moving variance.
- beta_regularizer: Optional regularizer for the beta weight.
- gamma_regularizer: Optional regularizer for the gamma weight.
- beta_constraint: Optional constraint for the beta weight.
- gamma_constraint: Optional constraint for the gamma weight.
Example:
Concatenate
Layer that concatenates a list of inputs.
Example:
- concatenate: [lstmBranch,textFeatureBranch]
Add
Layer that adds a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Example:
- add: [first,second]
Substract
ayer that subtracts two inputs.
It takes as input a list of tensors of size 2, both of the same shape, and returns a single tensor, (inputs[0] - inputs[1]), also of the same shape.
Example:
- substract: [first,second]
Mult
Layer that multiplies (element-wise) a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Example:
- mult: [first,second]
Max
Layer that computes the maximum (element-wise) a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Example:
- max: [first,second]
Min
Layer that computes the minimum (element-wise) a list of inputs.
It takes as input a list of tensors, all of the same shape, and returns a single tensor (also of the same shape).
Example:
- min: [first,second]
Conv1D
1D convolution layer (e.g. temporal convolution).
This layer creates a convolution kernel that is convolved with the layer input over a single spatial (or temporal) dimension to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide an input_shape argument (tuple of integers or None, does not include the batch axis), e.g. input_shape=(10, 128) for time series sequences of 10 time steps with 128 features per step in data_format="channels_last", or (None, 128) for variable-length sequences with 128 features per step.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
- kernel_size: An integer or tuple/list of a single integer, specifying the length of the 1D convolution window.
- strides: An integer or tuple/list of a single integer,
specifying the stride length of the convolution.
Specifying any stride value != 1 is incompatible with specifying
any
dilation_rate
value != 1. - padding: One of
"valid"
,"causal"
or"same"
(case-insensitive)."valid"
means "no padding"."same"
results in padding the input such that the output has the same length as the original input."causal"
results in causal (dilated) convolutions, e.g.output[t]
does not depend oninput[t + 1:]
. A zero padding is used such that the output has the same length as the original input. Useful when modeling temporal data where the model should not violate the temporal order. See WaveNet: A Generative Model for Raw Audio, section 2.1. - data_format: A string,
one of
"channels_last"
(default) or"channels_first"
. The ordering of the dimensions in the inputs."channels_last"
corresponds to inputs with shape(batch, steps, channels)
(default format for temporal data in Keras) while"channels_first"
corresponds to inputs with shape(batch, channels, steps)
. - dilation_rate: an integer or tuple/list of a single integer, specifying
the dilation rate to use for dilated convolution.
Currently, specifying any
dilation_rate
value != 1 is incompatible with specifying anystrides
value != 1. - activation: Activation function to use
(see activations).
If you don't specify anything, no activation is applied
(ie. "linear" activation:
a(x) = x
). - use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
- bias_constraint: Constraint function applied to the bias vector (see constraints).
Example:
Conv2D
2D convolution layer (e.g. spatial convolution over images).
This layer creates a convolution kernel that is convolved with the layer input to produce a tensor of outputs. If use_bias is True, a bias vector is created and added to the outputs. Finally, if activation is not None, it is applied to the outputs as well.
When using this layer as the first layer in a model, provide the keyword argument input_shape (tuple of integers, does not include the batch axis), e.g. input_shape=(128, 128, 3) for 128x128 RGB pictures in data_format="channels_last".
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- filters: Integer, the dimensionality of the output space (i.e. the number of output filters in the convolution).
- kernel_size: An integer or tuple/list of 2 integers, specifying the height and width of the 2D convolution window. Can be a single integer to specify the same value for all spatial dimensions.
- strides: An integer or tuple/list of 2 integers,
specifying the strides of the convolution
along the height and width.
Can be a single integer to specify the same value for
all spatial dimensions.
Specifying any stride value != 1 is incompatible with specifying
any
dilation_rate
value != 1. - padding: one of
"valid"
or"same"
(case-insensitive). Note that"same"
is slightly inconsistent across backends withstrides
!= 1, as described here - data_format: A string,
one of
"channels_last"
or"channels_first"
. The ordering of the dimensions in the inputs."channels_last"
corresponds to inputs with shape(batch, height, width, channels)
while"channels_first"
corresponds to inputs with shape(batch, channels, height, width)
. It defaults to theimage_data_format
value found in your Keras config file at~/.keras/keras.json
. If you never set it, then it will be "channels_last". - dilation_rate: an integer or tuple/list of 2 integers, specifying
the dilation rate to use for dilated convolution.
Can be a single integer to specify the same value for
all spatial dimensions.
Currently, specifying any
dilation_rate
value != 1 is incompatible with specifying any stride value != 1. - activation: Activation function to use
(see activations).
If you don't specify anything, no activation is applied
(ie. "linear" activation:
a(x) = x
). - use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to the kernel matrix (see constraints).
- bias_constraint: Constraint function applied to the bias vector (see constraints).
Example:
MaxPool1D
Max pooling operation for temporal data.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- pool_size: Integer, size of the max pooling windows.
- strides: Integer, or None. Factor by which to downscale.
E.g. 2 will halve the input.
If None, it will default to
pool_size
. - padding: One of
"valid"
or"same"
(case-insensitive). - data_format: A string,
one of
channels_last
(default) orchannels_first
. The ordering of the dimensions in the inputs.channels_last
corresponds to inputs with shape(batch, steps, features)
whilechannels_first
corresponds to inputs with shape(batch, features, steps)
.
Example:
MaxPool2D
Max pooling operation for spatial data.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.
- strides: Integer, tuple of 2 integers, or None.
Strides values.
If None, it will default to
pool_size
. - padding: One of
"valid"
or"same"
(case-insensitive). - data_format: A string,
one of
channels_last
(default) orchannels_first
. The ordering of the dimensions in the inputs.channels_last
corresponds to inputs with shape(batch, height, width, channels)
whilechannels_first
corresponds to inputs with shape(batch, channels, height, width)
. It defaults to theimage_data_format
value found in your Keras config file at~/.keras/keras.json
. If you never set it, then it will be "channels_last".
Example:
AveragePooling1D
Average pooling for temporal data.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- pool_size: Integer, size of the average pooling windows.
- strides: Integer, or None. Factor by which to downscale.
E.g. 2 will halve the input.
If None, it will default to
pool_size
. - padding: One of
"valid"
or"same"
(case-insensitive). - data_format: A string,
one of
channels_last
(default) orchannels_first
. The ordering of the dimensions in the inputs.channels_last
corresponds to inputs with shape(batch, steps, features)
whilechannels_first
corresponds to inputs with shape(batch, features, steps)
.
Example:
CuDNNLSTM
Fast LSTM implementation with CuDNN.
Can only be run on GPU, with the TensorFlow backend.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- units: Positive integer, dimensionality of the output space.
- kernel_initializer: Initializer for the
kernel
weights matrix, used for the linear transformation of the inputs. (see initializers). - recurrent_initializer: Initializer for the
recurrent_kernel
weights matrix, used for the linear transformation of the recurrent state. (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- unit_forget_bias: Boolean.
If True, add 1 to the bias of the forget gate at initialization.
Setting it to true will also force
bias_initializer="zeros"
. This is recommended in Jozefowicz et al. (2015). - kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - recurrent_regularizer: Regularizer function applied to
the
recurrent_kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - recurrent_constraint: Constraint function applied to
the
recurrent_kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
- return_sequences: Boolean. Whether to return the last output. in the output sequence, or the full sequence.
- return_state: Boolean. Whether to return the last state in addition to the output.
- stateful: Boolean (default False). If True, the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch.
Example:
Dense
Regular densely-connected NN layer.
Dense
implements the operation:
output = activation(dot(input, kernel) + bias)
where activation
is the element-wise activation function
passed as the activation
argument, kernel
is a weights matrix
created by the layer, and bias
is a bias vector created by the layer
(only applicable if use_bias
is True
).
Note: if the input to the layer has a rank greater than 2, then
it is flattened prior to the initial dot product with kernel
.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- units: Positive integer, dimensionality of the output space.
- activation: Activation function to use
(see activations).
If you don't specify anything, no activation is applied
(ie. "linear" activation:
a(x) = x
). - use_bias: Boolean, whether the layer uses a bias vector.
- kernel_initializer: Initializer for the
kernel
weights matrix (see initializers). - bias_initializer: Initializer for the bias vector (see initializers).
- kernel_regularizer: Regularizer function applied to
the
kernel
weights matrix (see regularizer). - bias_regularizer: Regularizer function applied to the bias vector (see regularizer).
- activity_regularizer: Regularizer function applied to the output of the layer (its "activation"). (see regularizer).
- kernel_constraint: Constraint function applied to
the
kernel
weights matrix (see constraints). - bias_constraint: Constraint function applied to the bias vector (see constraints).
Example:
Flatten
Flattens the input. Does not affect the batch size.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- data_format: A string,
one of
channels_last
(default) orchannels_first
. The ordering of the dimensions in the inputs. The purpose of this argument is to preserve weight ordering when switching a model from one data format to another.channels_last
corresponds to inputs with shape(batch, ..., channels)
whilechannels_first
corresponds to inputs with shape(batch, channels, ...)
. It defaults to theimage_data_format
value found in your Keras config file at~/.keras/keras.json
. If you never set it, then it will be "channels_last".
Example:
Bidirectional
Bidirectional wrapper for RNNs.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- layer:
Recurrent
instance. - merge_mode: Mode by which outputs of the forward and backward RNNs will be combined. One of {'sum', 'mul', 'concat', 'ave', None}. If None, the outputs will not be combined, they will be returned as a list.
- weights: Initial weights to load in the Bidirectional model
Example:
Utility layers
split
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Number of outputs is equal to a number of children.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-concat
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a concatenation of child flows.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
- split-concat:
- word_indexes_embedding: [ embeddings/glove.840B.300d.txt ]
- word_indexes_embedding: [ embeddings/paragram_300_sl999.txt ]
- word_indexes_embedding: [ embeddings/wiki-news-300d-1M.vec]
- lstm2: [128]
split-concatenate
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a concatenation of child flows (equal to the usage of Concatenate layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
- split-concat:
- word_indexes_embedding: [ embeddings/glove.840B.300d.txt ]
- word_indexes_embedding: [ embeddings/paragram_300_sl999.txt ]
- word_indexes_embedding: [ embeddings/wiki-news-300d-1M.vec]
- lstm2: [128]
split-add
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is an addition of child flows (equal to the usage of Add layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-substract
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a substraction of child flows (equal to the usage of Substract layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-mult
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a multiplication of child flows (equal to the usage of Mult layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-min
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a minimum of child flows (equal to the usage of Min layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-max
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a maximum of child flows (equal to the usage of Max layer).
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- **** -
Example:
split-dot
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a dot product of child flows.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
split-dot-normalize
Splits current flow into several ones. Each child is a separate flow with an input equal to the input of the split operation.
Output is a dot product with normalization of child flows.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
seq
Executes child elements as a sequence of operations, one by one.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
input
Overrides current input with what is listed.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
input: [firstRef, secondRef]
pass
Forwards data from this branch
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
transform-concat:
- pass
- Conv1D: [10,1,"relu"]
transform-concat
passes input tensors through layers, and then concatenates outputs
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- **** -
Example:
transform-concat:
- Conv1D: [10,1,"relu"]
- Conv1D: [10,2,"relu"]
transform-add
passes input tensors through layers, and then adds outputs
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
- **** -
Example:
transform-add:
- Conv1D: [10,1,"relu"]
- Conv1D: [10,2,"relu"]
Stage properties
callbacks
type: array of callback instances
Sets up training-time callbacks. See individual callback descriptions.
Example:
callbacks:
EarlyStopping:
patience: 100
monitor: val_binary_accuracy
verbose: 1
ReduceLROnPlateau:
patience: 16
factor: 0.5
monitor: val_binary_accuracy
mode: auto
cooldown: 5
verbose: 1
epochs
type: integer
Number of epochs to train for this stage.
Example:
extra_callbacks
Allows to specify a list of additional callbacks that should be applied to this stage
initial_weights
type: string
Fil path to load stage NN initial weights from.
Example:
initial_weights: /initial.weights
negatives
type: string or integer
The support of binary data balancing for training set.
Following values are acceptable:
- none - exclude negative examples from the data
- real - include all negative examples
- integer number(1 or 2 or anything), how many negative examples should be included per one positive example
In order for the system to determine whether a particular example is positive or negative,
the data set class defined by the dataset property should have isPositive
method declared
that accepts data set item and returns boolean.
Example:
stages:
- epochs: 6 #Train for 6 epochs
negatives: none #do not include negative examples in your training set
validation_negatives: real #validation should contain all negative examples
- lr: 0.0001 #let's use different starting learning rate
epochs: 6
negatives: real
validation_negatives: real
- loss: lovasz_loss #let's override loss function
lr: 0.00001
epochs: 6
initial_weights: ./fpn-resnext2/weights/best-0.1.weights #let's load weights from this file
loss
type: string
Sets the loss name.
Uses loss name detection mechanism to search for the built-in loss or for a custom function with the same name across project modules.
Example:
loss: binary_crossentropy
lr
type: float
Learning rate.
Example:
lr: 0.01
validation_negatives
type: string or integer
The support of binary data balancing for validation set.
Following values are acceptable:
- none - exclude negative examples from the data
- real - include all negative examples
- integer number(1 or 2 or anything), how many negative examples should be included per one positive example
In order for the system to determine whether a particular example is positive or negative,
the data set class defined by the dataset property should have isPositive
method declared
that accepts data set item and returns boolean.
Example:
stages:
- epochs: 6 #Train for 6 epochs
negatives: none #do not include negative examples in your training set
validation_negatives: real #validation should contain all negative examples
- lr: 0.0001 #let's use different starting learning rate
epochs: 6
negatives: real
validation_negatives: real
- loss: lovasz_loss #let's override loss function
lr: 0.00001
epochs: 6
initial_weights: ./fpn-resnext2/weights/best-0.1.weights #let's load weights from this file
Preprocessors
type: complex
Preprocessors are the custom python functions that transform dataset.
Such functions should be defined in python files that are in a project scope (modules
) folder and imported.
Preprocessing functions should be also marked with @preprocessing.dataset_preprocessor
annotation.
Preprocessors
instruction then can be used to chain preprocessors as needed for this particular experiment, and even cache the result on disk to be reused between experiments.
Example:
preprocessing:
- binarize_target:
- tokenize:
- tokens_to_indexes:
maxLen: 160
- disk-cache:
cache
Caches its input.
Caches its input in memory, including the full flow.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
preprocessing:
- binarize_target:
- tokenize:
- tokens_to_indexes:
maxLen: 160
- cache:
disk-cache
Caches its input on disk, including the full flow. On subsequent launches if nothing was changed in the flow, takes its output from disk instead of re-launching previous operations.
Properties:
- name - string; optionally sets up layer name to refer it from other layers.
- inputs - array of strings; lists layer inputs.
Example:
preprocessing:
- binarize_target:
- tokenize:
- tokens_to_indexes:
maxLen: 160
- disk-cache:
split-preprocessor
An analogue of split for preprocessor operations.
Example:
split-concat-preprocessor
An analogue of split-concat for preprocessor operations.
Example:
seq-preprocessor
An analogue of seq for preprocessor operations.
Example:
augmentation
Preprocessor instruction, which body only runs during the training and is skipped when the inferring.
Example:
fit script arguments
fit.py project
type: string
Folder to search for experiments, project root.
Example:
-m musket_core.fit --project "path/to/project"
fit.py name
type: string or comma-separated list of strings
Name of the experiment to launch, or a list of names.
Example:
-m musket_core.fit --name "experiment_name"
-m musket_core.fit --name "experiment_name1, experiment_name2"
fit.py num_gpus
type: integer
Default: 1
Number of GPUs to use during experiment launch.
Example:
-m musket_core.fit --num_gpus=1
fit.py gpus_per_net
type: integer
Default: 1
Maximum number of GPUs to use per single experiment.
Example:
-m musket_core.fit --gpus_per_net=1
fit.py num_workers
type: integer
Default: 1
Number of workers to use.
Example:
-m musket_core.fit --num_workers=1
fit.py allow_resume
type: boolean
Default: False
Whether to allow resuming of experiments, which will cause unfinished experiments to start from the best saved weights.
Example:
-m musket_core.fit --allow_resume True
fit.py force_recalc
type: boolean
Default: False
Whether to force rebuilding of reports and predictions.
Example:
-m musket_core.fit --force_recalc True
fit.py launch_tasks
type: boolean
Default: False
Whether to launch associated tasks.
Example:
-m musket_core.fit --launch_tasks True
fit.py only_report
type: boolean
Default: False
Whether to only generate reports for cached data, no training occurs.
Example:
-m musket_core.fit --only_report True
fit.py cache
type: string
Path to the cache folder. Cache folder will contain temporary cached data for executed experiments.
Example:
-m musket_core.fit --cache "path/to/cache/folder"
fit.py folds
type: integer or comma-separated list of integers
Folds to launch. By default all folds of experiment will be executed, this argument allows launching only some of them.
Example:
-m musket_core.fit --folds 1,2
task script arguments
task.py project
type: string
Folder to search for experiments, project root.
Example:
task.py --project "path/to/project"
task.py name
type: string or comma-separated list of strings
Name of the experiment to launch, or a list of names.
Example:
task.py --name "experiment_name"
task.py --name "experiment_name1, experiment_name2"
task.py task
type: string or comma-separated list of strings
Default: all tasks.
Name of the task to launch, or a list of names.
Example:
task.py --task "task_name"
task.py --task "task_name1, task_name2"
task.py --task "all"
task.py num_gpus
type: integer
Default: 1
Number of GPUs to use during experiment launch.
Example:
task.py --num_gpus=1
task.py gpus_per_net
type: integer
Default: 1
Maximum number of GPUs to use per single experiment.
Example:
task.py --gpus_per_net=1
task.py num_workers
type: integer
Default: 1
Number of workers to use.
Example:
task.py --num_workers=1
task.py allow_resume
type: boolean
Default: False
Whether to allow resuming of experiments, which will cause unfinished experiments to start from the best saved weights.
Example:
task.py --allow_resume True
task.py force_recalc
type: boolean
Default: False
Whether to force rebuilding of reports and predictions.
Example:
task.py --force_recalc True
task.py launch_tasks
type: boolean
Default: False
Whether to launch associated tasks.
Example:
task.py --launch_tasks True
task.py cache
type: string
Path to the cache folder. Cache folder will contain temporary cached data for executed experiments.
Example:
task.py --cache "path/to/cache/folder"
analyze script arguments
analyze.py inputFolder
type: string
Folder to search for finished experiments in. Typically, project root.
Example:
analyze.py --inputFolder "path/to/project"
analyze.py output
type: string
Default: report.csv
in project root.
Output report file path.
Example:
analyze.py --output "path/to/project/report/report.scv"
analyze.py onlyMetric
type: string
Name of the single metric to take into account.
Example:
analyze.py --onlyMetric "metric_name"
analyze.py sortBy
type: string
Name of the metric to sort result by.
Example:
analyze.py --sortBy "metric_name"