You have just found Musket

Musket is a family of high-level frameworks written in Python and capable of running on top of Keras.

It was developed with a focus of enabling to make fast and simply-declared experiments, which can be easily stored, reproduced and compared to each other.

Use Musket if you need a deep learning framework that:

  • Allows to describe experiments in a compact and expressive way
  • Provides a way to store and compare experiments in order to methodically find the best deap learning solution
  • Easy to share experiments and their results to work in a team
  • Provides IDE and visual tooling to make experimentation faster

There are some videos to check here.

Main framework website: musket-ml.com

Goals and principles

Compactness and declarative description

Declarative description is always more compact and human-readable than imperative description.

All experiments are declared in YAML dialect with lots of defaults, allowing to describe an initial experiment in several lines and then set more details if needed.

This is a simple classification experiment, and half of these instructions can be actually omitted:

#%Musket Classification 1.0
architecture: Xception
classes: 101
activation: softmax
weights: imagenet
shape: [512, 512, 4]
optimizer: Adam
batch: 8
lr: 0.001
primary_metric: val_macro_f1
primary_metric_mode: max
dataset:
  combinations_train: []

Reproducibility and ease of sharing

As each experiment is simply a folder with YAML file inside, it is easy to store and run experiment.

Putting YAML files into git or sharing them in other way provides other team members with an easy way to reproduce the same experiments locally. Anyone can check your experiments, and add their own to the storage as the storage is simply a folder.

Established way to store and compare results

Musket is lazy by its nature. Each experiment starts with a simple YAML description. There may be many stages in training and prediction, starting with calculating datasets, preprocessing and finishing with inferring and calculating statistics, but for each stage Musket saves results in the sub-folders of experiment folder.

When the experiment is launched, Musket checks, which result files are already in place and only runs what is needed. It is up to team members, what to share: pure YAML desciptions, YAML and final metrics (to compare experiment effectiveness), or also, potentially more heavy intermediate results so other team members can run experiments faster locally.

It is easy to compare two experiments with each other by running any text compare tooling, experiments are just YAML text:

YAML comparison

As all experiment statistics is also saved as files, it is easy to compare experiment results and find the best ones by the same text files comparison tooling.

IDE helps here, too, by adding results visualisation tooling.

Flexibility and extensibility

Declarative approach is good and compact, but sometimes we want to define some custom functionality.

Musket supports lots of custom substances: dataset definitions, preprocessors, custom network layers, visualizers etc etc.

Most of the time to define a custom thing, it is enough to put a python file into a top-level folder and define a function with an appropriate annotation, like this:

@preprocessing.dataset_preprocessor
def splitInput(input, parts:int):
    result = np.array_split(input,parts,axis=0)
    return result

or this:

@dataset_visualizer
def visualize(val:PredictionItem):
    cache_path=context().path
    path = cache_path + "/" + str(val.id) + ".png"
    if os.path.exists(path):
        return path
    ma = val.x/128 - preprocessors.moving_average(val.x/128, 8000)
    std = np.std(ma)

    ma[np.where(np.abs(ma) - 2 * std < 0)] = 0
    v = ma
    fig, axs = plt.subplots(1, 1, constrained_layout=True, figsize=(15, 10))

    v[:, 0] += 1
    v[:, 2] -= 1

    plt.ylim(-2, 2)
    axs.plot(v[:, 0], label='Phase 0')
    axs.plot(v[:, 1], label='Phase 1')
    axs.plot(v[:, 2], label='Phase 2')
    axs.legend()
    if sum(val.y) > 0:
        axs.set_title('bad wire:' + str(val.id))

        plt.savefig(path)
    else:
        axs.set_title('normal wire:' + str(val.id))
        plt.savefig(path)
    try:    
        plt.close()
    except:
        pass    
    return path

Pipelines and IDE

Musket is family of frameworks, not a single framework for a reason.

There is a core part, a pipeline called Generic Pipeline, which is quite universal and can handle any type of tasks.

Besides it, there are also specialized pipelines with YAML domain syntax better suited for a particular task like Segmentation Pipeline or Classification Pipeline. Such specialized frameworks has reduced flexibility, but more rapid prototyping and a whole set of useful built-ins.

All of those pipelines are supported by musket IDE, which simplifies experiment running and result analysis.

YAML comparison

Generic pipeline

Generic pipeline has the most universal YAML-based domain-specific syntax of all pipelines. Its main feature is an ability to define custom neural networks in a declarative manner by declaring blocks basing on built-in blocks, and then referring custom blocks from other custom blocks.

There is also a rich set of declarative instructions that control dataflow inside the network. Most elements like datasets, preprocessors, network blocks, loss functions, metrics etc can be customly defined in python code and later reused from YAML.

imports: [ layers, preprocessors ]
declarations:
  collapseConv:
    parameters: [ filters,size, pool]
    body:
      - conv1d: [filters,size,relu ]
      - conv1d: [filters,size,relu ]
      - batchNormalization: {}
      - collapse: pool
  net:
    #- gaussianNoise: 0.0001
    - repeat(2):
      - collapseConv: [ 20, 7, 10 ]

    - cudnnlstm: [40, true ]
    - cudnnlstm: [40, true ]
    - attention: 718
    - dense: [3, sigmoid]
  preprocess:
     - rescale: 10
     - get_delta_from_average
     - cache
preprocessing: preprocess
testSplit: 0.4
architecture: net
optimizer: Adam #Adam optimizer is a good default choice
batch: 12 #Our batch size will be 16
metrics: #We would like to track some metrics
  - binary_accuracy
  - matthews_correlation
primary_metric: val_binary_accuracy #and the most interesting metric is val_binary_accuracy
callbacks: #Let's configure some minimal callbacks
  EarlyStopping:
    patience: 100
    monitor: val_binary_accuracy
    verbose: 1
  ReduceLROnPlateau:
    patience: 8
    factor: 0.5
    monitor: val_binary_accuracy
    mode: auto
    cooldown: 5
    verbose: 1
loss: binary_crossentropy #We use simple binary_crossentropy loss
stages:
  - epochs: 100 #Let's go for 100 epochs
  - epochs: 100 #Let's go for 100 epochs
  - epochs: 100 #Let's go for 100 epochs

Segmentation Pipeline

Segmentation Pipeline has a lot of common parts with Generic pipeline, but it is much easier to define an architecture of the network, just name it:

backbone: mobilenetv2 #let's select classifier backbone for our network 
architecture: DeepLabV3 #let's select segmentation architecture that we would like to use
augmentation:
 Fliplr: 0.5 #let's define some minimal augmentations on images
 Flipud: 0.5 
classes: 1 #we have just one class (mask or no mask)
activation: sigmoid #one class means that our last layer should use sigmoid activation
encoder_weights: pascal_voc #we would like to start from network pretrained on pascal_voc dataset
shape: [320, 320, 3] #This is our desired input image and mask size, everything will be resized to fit.
optimizer: Adam #Adam optimizer is a good default choice
batch: 16 #Our batch size will be 16
metrics: #We would like to track some metrics
  - binary_accuracy 
  - iou
primary_metric: val_binary_accuracy #and the most interesting metric is val_binary_accuracy
callbacks: #Let's configure some minimal callbacks
  EarlyStopping:
    patience: 15
    monitor: val_binary_accuracy
    verbose: 1
  ReduceLROnPlateau:
    patience: 4
    factor: 0.5
    monitor: val_binary_accuracy
    mode: auto
    cooldown: 5
    verbose: 1
loss: binary_crossentropy #We use simple binary_crossentropy loss
stages:
  - epochs: 100 #Let's go for 100 epochs

Classification Pipeline

Classification Pipeline has a lot of common parts with Generic pipeline too, and as in Segmentation Pipeline it is easy to define an architecture of the network, just name it and set the number of output classes:

architecture: DenseNet201 #pre-trained model we are going to use
pooling: avg
augmentation: #define some minimal augmentations on images
 Fliplr: 0.5
 Flipud: 0.5
classes: 28 #define the number of classes
activation: sigmoid #as we have multilabel classification, the activation for last layer is sigmoid
weights: imagenet #we would like to start from network pretrained on imagenet dataset
shape: [224, 224, 3] #our desired input image size, everything will be resized to fit
optimizer: Adam #Adam optimizer is a good default choice
batch: 16 #our batch size will be 16
lr: 0.005
copyWeights: true
metrics: #we would like to track some metrics
  - binary_accuracy
  - macro_f1
primary_metric: val_binary_accuracy #the most interesting metric is val_binary_accuracy
primary_metric_mode: max
callbacks: #configure some minimal callbacks
  EarlyStopping:
    patience: 3
    monitor: val_macro_f1
    mode: max
    verbose: 1
  ReduceLROnPlateau:
    patience: 2
    factor: 0.3
    monitor: val_binary_accuracy
    mode: max
    cooldown: 1
    verbose: 1
loss: binary_crossentropy #we use binary_crossentropy loss
stages:
  - epochs: 10 #let's go for 100 epochs