The diabetic retinopathy dataset
The dataset for the building the Diabetic Retinopathy detection application is obtained from Kaggle and can be downloaded from following the link: https://www.kaggle.com/c/ classroom-diabetic-retinopathy-detection-competition/data.
Both the training and the holdout test datasets are present within the train_dataset.zip file, which is available at the preceding link.
We will use the labeled training data to build the model through cross-validation. We will evaluate the model on the holdout dataset.
Since we are dealing with class prediction, accuracy will be a useful validation metric. Accuracy is defined as follows:
Here, c is the number of correctly classified samples, and N is the total number of evaluated samples.
We will also use the quadratic weighted kappa statistics to determine the quality of the model, and to have a benchmark as to how good the model is, compared to Kaggle standards. The quadratic weighted kappa is defined as follows:
The weight (wi,j) in the expression for quadratic weighted kappa is as follows:
In the preceding formula, the following applies:
- N represents the number of classes
- Oij represents the number of images that have been predicted to have class i, and where the actual class of the image is j
- Eij represents the expected number of observations for the predicted class which is i, and the actual class being j, assuming independence between the predicted class and the actual class
To better understand kappa metrics components, let's look at a binary classification of apples and oranges. Let's assume that the confusion matrix of the predicted and actual classes is as shown in the following diagram:
The expected count of predicting Apple when the true label is Orange, assuming independence between the labels, is given by the following formula:
This expected count is the worst error that you can make, given that there is no model.
If you're familiar with the chi-square test for independence between two categorical variables, the expected count in each cell of the contingency table is computed based on the same formula, assuming independence between the categorical variables.
The observed count of the model predicting Apple when the true label is Orange can be directly traced from the confusion matrix, and is equal to 5, as follows:
Hence, we can see that the error the model makes in predicting Orange as Apple is less than the error we would obtain if we were to use no model. Kappa generally measures how well we are doing in comparison to predictions made without a model.
If we observe the expression for the quadratic weights, (wi,j), we can see that the value of the weights is higher when the difference between the actual and the predicted label is greater. This makes sense, due to the ordinal nature of classes. For example, let's denote an eye in perfect condition with the class label zero; a mild diabetic retinopathy condition with one; a moderate diabetic retinopathy condition with two; and a severe condition of diabetic retinopathy with three. This quadratic term weight, (wi,j), is going to be higher when a mild diabetic retinopathy condition is wrongly classified as severe diabetic retinopathy, rather than as moderate diabetic retinopathy. This makes sense, since we want to predict a condition as close to the actual condition as possible, even if we don't manage to predict the actual class.
We will be using sklearn.metrics.cohen_kappa_score with weights= "quadratic" to compute the kappa score. The higher the weights, the lower the kappa score will be.