Chest Xrays are used to diagnose multiple diseases. From pneumonia to lung nodules multiple diseases can be diagnosed using just this one modality using Deep Learning. Chest Xray 14 dataset was recently released by NIH which has over 90000 Xray plates tagged with 14 diseases or being normal. This has started a race to make Computer Aided Diagnosis (CAD) Systems which can learn discerning thoracic diseases from Xrays. If you happen to be following the development following the release of the dataset, you would have noticed research coming out from various research labs on this dataset. CheXNet, which was released by StanfordML is probably the most famous, which claims to be better than doctors in diagnosing pneumonia. There are multiple other papers (such as Chest Pathology Classification in X-Rays Using Generative Adversarial Networks, Thoracic Disease Identification and Localization with Limited Supervision and Learning to diagnose from scratch by exploiting dependencies among labels) etc which are trying to use Deep Learning based methods to diagnose Chest X-rays.
We at ParallelDots have also worked on this dataset in the past and came up with methods to get competitive performance on diagnosing Chest X-rays using Deep Learning. Our paper can be checked out here. We achieve better results than Chest X-ray14 baselines and competitive results to the state of the artwork (the Stanford Paper). In fact, our algorithm is better than the ChesXNet paper in diagnosing at least one disease.
Our method doesn’t involve transfer learning like most other methods but rather trains a Deep Dense Neural Network from scratch. Our work provides the quantitative results to answer following research questions for the dataset:
1) What loss functions to use for training DCNN from scratch on the ChestX-ray14 dataset that demonstrates high class imbalance and label co-occurrence?
2) How to use cascading to model label dependency and to improve the accuracy of the deep learning model?
We suggest techniques that would help train a Neural Network well on such a dataset where disease labels are neither independent nor exclusive. We experiment with two types of loss functions:
1) Using Binary Relevance (which effectively means training Neural Network independently for +/- classifier for each disease label than training it as a 1-out-of-N-Outcomes)
2) Pairwise Error [PWE] which tries to maximize margins between +/- classes in each disease. On top of this, we use Cascading in a way which can exploit the dependencies between the disease labels to maximize output.
Cascading method involves sending the output of one stage of Machine Learning algorithm into the next stage of the algorithm, thus in a way making an algorithm learn from mistakes it made in the train set. The following diagram explains our boosting algorithm in detail :
The ROC scores on various diseases of different methods are listed in the following table. Please note that BR/PWE/C-BR and C-PWE are 4 methods we tried out (Binary Relevance, Pairwise Error and cascaded versions of both respectively). Rajapurkar et al. Is Stanford paper.
In this study, the present work provides optimistic results for the automatic diagnosis of
thoracic diseases using chest X-ray. This is just a stepping stone for further upcoming research which will help doctors fasten the detection process for multiple diseases, hence, providing them additional valuable time to concentrate more on the curing the diseases.
At ParallalDots, we are working on challenging problems in Healthcare, NLP and Image Classification using Deep Learning. Please watch this space for more updates on the research work at ParallelDots.