Classification of Heatmaps
Table of Contents
Introduction
In the last notebook we extracted geometrical features from our heatmaps and saved them in csv files. Now we will use these features to train a simple classifier to predict the lables of the slides (negative, itc, micro, macro).
Requirements
Python-Modules
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_validate
from sklearn.preprocessing import Imputer
from sklearn import tree, naive_bayes, ensemble
from sklearn.externals.six import StringIO
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix
from graphviz import Source
Exercises
Before we start, adjust the path of CAM_BASE_DIR
(and also other variables as needed).
### EDIT THIS CELL:
### Do not edit this cell
### Assign the path to your CAMELYON16 data and create the directories
### if they do not exist yet.
CAM_BASE_DIR = '/path/to/CAMELYON/data/'
# exmple:
CAM_BASE_DIR = '/media/klaus/2612FE3171F55111/'
GENERATED_DATA = CAM_BASE_DIR + 'tutorial/'
HEATMAP_DIR = CAM_BASE_DIR + 'c16traintest_c17traintest_heatmaps_grey/'
# CAMELYON16 and 17 ground truth labels
PATH_C16_LABELS = CAM_BASE_DIR + 'CAMELYON16/test/Ground_Truth/reference.csv'
PATH_C17_LABELS = CAM_BASE_DIR + 'CAMELYON17/training/stage_labels.csv'
FEATURES_C16TEST = GENERATED_DATA + 'features_c16_test.csv'
FEATURES_C17TRAIN = GENERATED_DATA +'features_c17_train.csv'
FEATURES_C17TEST = GENERATED_DATA +'features_c17_test.csv'
# Here we will save our predictions
PATH_C17TRAIN_PREDICITONS = CAM_BASE_DIR + 'CAMELYON17/c17_train_predictions.csv'
PATH_C17TEST_PREDICITONS = CAM_BASE_DIR + 'CAMELYON17/c17_train_predictions.csv'
Load the Data
Now we read in the csv files as pandas DataFrame
objects.
Prepare the Data
Task:
Concatenate c16_test
and c17_train
in a new DataFrame
variable c1617_train
. That is the dataset we will use for hyperparameter optimization.
c1617_train = None ### Exercise
Now we split the data into labels and features.
Task:
- Load the
stage
column into a variabley
(1D-numpy array) - Load the six features
highest_probability
, ... into a variablex
# Exercise
x = None # features
y = None # labels
Remove Invalid Values
Some heatmaps (~2 or 3) could not be created by the CNN, so values for some slides ar missing.
Task:
Replace the missing values using the sklearn.preprocessing.Imputer
class.
Hint:
For better results look at the labels of the missing heatmaps and replace the values for the features with the label mean.
### Exercise
# imp =
# x =
Train and Visualize Simple Decission Tree
Now we are ready to define and train a decision tree. We use the scikit learn decison tree module.
Task:
- Define and train a decision tree for visualization first
- Define and train a decision tree for validation with cross validation
- Hint: Search for good hyperparameters using the CAMELYON16 test set and the CAMELYON17 training set
- Define and train a decision tree for predicting the CAMELYON17 training set
- Hint1: Use
cross_val_predict
- Hint2: For optimal results always use the complete CAMELYON16 test set and all but one slide of the CAMELYON17 training set for training the classifier and only predict the one slide left.
clf = None # Exercise
clf
is an instance of a trained decision tree classifier.
The decision tree can be visualized. For this we must write a graphviz dot-File
It should look like:
graph = Source(tree.export_graphviz(clf, out_file=None
, feature_names=columns
, filled = True))
graph
### To open in seperate window and save it
#graph.format = 'png'
#graph.render('dtree_render',view=True)
Save as CSV
First prepare the DataFrame:
- make a deep copy of
c17_train
- replace the stage values with you prediction
- remove all collumns except the
patient
andstage
column - save as csv
print(len(names_preds))
c17_train_copy = c17_train.copy(deep=True)
print(c17_train_copy.loc[0].values[2])
for i in range(500):
c17_train_copy.loc[i,'stage'] = names_preds[c17_train_copy.loc[i,'patient']]
count = 0
for i in range(500):
if c17_train_copy.loc[i].values[2] == c17_train.loc[i].values[2]:
count += 1
print(count / 500)
c17_train_copy = c17_train_copy.drop(c17_train_copy.columns[3:], axis=1)
c17_train_copy = c17_train_copy.drop(c17_train_copy.columns[0], axis=1)
c17_train_copy.to_csv(PATH_C17TRAIN_PREDICITONS)
Summary and Outlook
Congratulations. If you worked through all notebooks from the beginning you just completed 99% of the complete CAMELYON challenge (16 and 17).
If you want to improve your classifier, you can try to exchange the decision tree with another algorithm, e.g naive bayes, support vector machine, random forest, etc.... Depending on the classification algorithm you might also want to try out feature selection beforehand.
In the next notebook you will determine the patient's pN stage based on your predictions for the slides, to finally calculate the kappa score and compare your results with others on the CAMELYON website.
Licenses
Notebook License (CC-BY-SA 4.0)
The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).
exercise-classify-heatmaps
by Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.
Code License (MIT)
The following license only applies to code cells of the notebook.
Copyright 2018 Klaus Strohmenger
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.