Getting dataset from Kaggle
Installing kaggle stuff
This should be done only once.
Run pip install kaggle
in console.
Log into Kaggle
Click on a profile in the top-right corner and choose My Account
On the account page find Api
section and click Create New API Token
.
This will launch the download of kaggle.json
token file.
Put the file into ~/.kaggle/kaggle.json
or C:\Users\<Windows-username>\.kaggle\kaggle.json
depending on OS.
Note: there are potential troubles of creating C:\Users\<Windows-username>\.kaggle
using windows explorer.
To create this folder from console, run cmd
and launch the following commands:
cd C:\Users\<Windows-username>
, mkdir .kaggle
.
Consult to Kaggle API in case of other troubles.
Downloading TGS Salt competition dataset
Go to TGS Salt Identification competition and Accept the rules on the Rules
tab.
Make salt
folder somewhere and create data
subdirectory. Open console with salt/data
folder as current
and invoke kaggle competitions download -c tgs-salt-identification-challenge
command.
This will download dataset files.
Then invoke unzip train.zip -d train
to unzip train.zip
files in to train
folder.
Adding an experiment
Create experiments
folder inside salt
folder.
Create exp01
folder inside experiments
folder.
Create config.yaml
file inside exp01
folder.
Put the following code inside config.yaml
:
#%Musket Segmentation 1.0
backbone: resnet34 #let's select classifier backbone for our network
architecture: Unet #pre-trained model we are going to use
augmentation: #define some minimal augmentations on images
Fliplr: 0.5
Flipud: 0.5
classes: 1 #define the number of classes
activation: sigmoid #as we have multilabel classification, the activation for last layer is sigmoid
shape: [224,224, 3] #our desired input image size, everything will be resized to fit
optimizer: Adam #Adam optimizer is a good default choice
batch: 8 #our batch size will be 16
lr: 0.001
metrics: #we would like to track some metrics
- binary_accuracy
- dice
primary_metric: val_dice #the most interesting metric is val_binary_accuracy
primary_metric_mode: max
folds_count: 5
testSplit: 0.2
dumpPredictionsToCSV: true
callbacks: #configure some minimal callbacks
EarlyStopping:
patience: 10
monitor: val_dice
mode: max
verbose: 1
ReduceLROnPlateau:
patience: 2
factor: 0.3
monitor: val_binary_accuracy
mode: max
cooldown: 1
verbose: 1
loss: binary_crossentropy #we use binary_crossentropy loss
stages:
- epochs: 50
dataset:
getTrain: []
final_metrics: [ dice_with_custom_treshold_true_negative_is_one ] #You may use more then one metric here
experiment_result: dice_with_custom_treshold_true_negative_is_one
testTimeAugmentation: Horizontal_and_vertical
You can find the details regarding this code in User guide.
We can greatly speed up the training process by reducing the
number of folds from 5 to 1 by replacing folds_count: 5
with
folds_count: 1
, but this will train the only fold.
Reducing the number of epochs will also speed things up by the cost of
the quality: replace - epochs: 50
with - epochs: 20
if you wish so.
Adding dataset
Note the following instruction in our experiment YAML:
dataset:
getTrain: []
This instruction expects a python function somewhere on the scope, which is named
getTrain
and that should return dataset. Lets add it:
Create modules
folder inside salt
folder.
In modules
folder create a file datasets.py
(file name can be really anything).
Put the following code in the file:
from musket_core import image_datasets
def getTrain():
return image_datasets.BinarySegmentationDataSet(["train/images"],"train.csv","id","rle_mask")
First argument sets the images folder inside data
.
Second argument points to the CSV, third - CSV column with image IDs.
The forth one points to CSV column with RLE mask.
Running the experiment
Launch the console and run the following command, taking into account
that ..salt
should be replaced with the path to the project top-level
salt
directory.
musket fit --project "...salt" --name "exp01" --num_gpus=1 --gpus_per_net=1 --num_workers=1 --cache "...salt\data\cache"
This will launch the training process.
Checking experiment results
When the training process complete, exp01
experiment folder will contain the
new summary.yaml
file with contents similar to the following:
allStages:
binary_accuracy: {max: 0.94829979903931, mean: 0.9432402460543085, min: 0.9348129166258212,
std: 0.004766299735936636}
binary_accuracy_holdout: 0.9447703656504265
dice: {max: 0.8946749116884245, mean: 0.8799628551168702, min: 0.8512279018375117,
std: 0.015545121737934034}
dice_holdout: 0.8792530163407674
dice_with_custom_treshold_true_negative_is_one: {max: 0.7921144300184988, mean: 0.7906020228394381,
min: 0.7885881826799147, std: 0.0012594068050284218}
dice_with_custom_treshold_true_negative_is_one_holdout: 0.7918140261625017
dice_with_custom_treshold_true_negative_is_one_treshold: {max: 0.5700000000000001,
mean: 0.5680000000000001, min: 0.56, std: 0.0040000000000000036}
dice_with_custom_treshold_true_negative_is_one_treshold_holdout: 0.56
cfgName: config.yaml
completed: true
folds: [0, 1, 2, 3, 4]
stages:
- binary_accuracy: {max: 0.94829979903931, mean: 0.9432402460543085, min: 0.9348129166258212,
std: 0.004766299735936636}
binary_accuracy_holdout: 0.9447703656504265
dice: {max: 0.8946749116884245, mean: 0.8799628551168702, min: 0.8512279018375117,
std: 0.015545121737934034}
dice_holdout: 0.8792530163407674
dice_with_custom_treshold_true_negative_is_one: {max: 0.7924899348384953, mean: 0.7882949615678969,
min: 0.7825054457176929, std: 0.003500868603109435}
dice_with_custom_treshold_true_negative_is_one_holdout: 0.7918140261625017
dice_with_custom_treshold_true_negative_is_one_treshold: {max: 0.6, mean: 0.5680000000000001,
min: 0.51, std: 0.03310589071449368}
dice_with_custom_treshold_true_negative_is_one_treshold_holdout: 0.56
subsample: 1.0
Lets take a closer look:
completed: true
indicates that the training was completed.
Sections of allStages
and stages
differ when
there are more than a single stage, but in our case data inside are the same.
binary_accuracy
and dice
values indicate metric results on validation.
Those values appear in summary due to those metrics were listed in metrics
section of config.yaml
.
As we have multiple folds and each fold has its own results, all metrics list max, min, mean and std values.
Due to the testSplit
instruction in config.yaml
we have a holdout,
so there is *_holdout
values for each metrics indicating metric results
on holdout dataset.
As dice
metric was referred in primary_metric
instruction in config.yaml
,
there are lots of other values for this metric, their names speak for themselves.
Besides summary.yaml
file, which contain the top-level results and
may ommitted if something fails in the training process, there are more
detailed and precises logs inside metrics
subfolder of exp01
folder.
There are files named metrics-X.Y.csv
, where X
is the fold number,
and Y
is the stage number.
Lets take a look:
epoch,binary_accuracy,dice,loss,lr,val_binary_accuracy,val_dice,val_loss
0,0.8144591341726481,0.5552861321416458,0.4154038934037089,0.001,0.7323275666683913,0.026507374974244158,0.7329991146922111
1,0.8632630387321114,0.6899019529577345,0.331550125265494,0.001,0.7443207234144211,0.012612525901568005,0.5845782220363617
The first column is an epoch number. Then there are all metrics listed on training set, then loss and learning rate, and finally same metrics and loss on validation.
These values allow to see how the training was advancing.