Today, as we are witnessing the era of smart AI-driven solutions which are empowering humans to automate tasks that require a certain level of cognition. One major reason for this shift in developing AI-driven products is the availability of a large amount of data. As we, humans, tend to learn through various experiences throughout our life, machines learn and automate tasks based on the data fed to them.
From our experience in developing vertical agnostic AI-first products, we are well aware of the importance of the availability of quality data and subsequently developing a smart data tagging process. In this series of blog posts, we’re going to talk about the importance of data tagging in medical imaging, where we are developing computer vision technologies to better assist doctors. This particular piece gives an overview of all the important things to consider when you are tagging the medical data for an AI-powered algorithm to use.
Important aspects of Data Tagging
Before you even begin working on tagging medical images, it is important to plan different aspects beforehand. We are talking about handling terabytes of medical imaging data. All the processes, workflows and other specs need to be finalized before you begin tagging the data. Let us walk you through the aspects which we should take in consideration for developing a smart data tagging.
Deciding what to tag; and how to tag?
This is the most important task – you should know what you want to tag. Considerable efforts need to be put in to understand the data you want to capture through the tagging. Determining the exact list of tags, levels of tagging and properties of each of these tags is extremely critical and, if done properly, can save you a lot of money and time. For example, to build a dataset of subtypes of hemorrhage, one needs to make a list of all types of intracranial hemorrhage such as subarachnoid hemorrhage and subdural hemorrhage.
Additionally, it is important to understand each tag in your list carefully as it comes with properties of its own. The tagging engine should be broad enough to include the properties of each tag. For example, intraparenchymal hemorrhage can be easily tagged by making an outline, whereas a pathology like cerebral atrophy is nearly impossible to tag by making a mere outline. Such an annotation would be a property of a slice in the entire CT scan. This exercise has to be done for the entire list so as to maintain the quality of tagging.
Training the professionals:
Processes: It is important to define all the processes right in the beginning and subsequently make sure that all the stakeholders understand them. These processes will enable you to build a faster, scalable and accurate tagging engine. Each aspect such as data flow, quality control and training processes is a broader topic of discussion, we will be discussing them in detail in the next few blogs.
Handshakes: A perfect data flow and synchronization have to be there in the involved parties; the Data Science team, Medical Professionals and the Software Development team.
- Data Science team: They have a very critical role to play in the engine. At the end, the output of the tagging engine is meant for them to train the algorithms. They need to explain the software development team the format in which they need the data. The team also helps in figuring out which tag should be annotated in which way.
- Medical Professionals: They should be provided with all the required information which enables them to tag as accurately as possible. For example, if you want to tag ‘bone fracture’ in NCCT head, bone window is a must for them. Specifications of the software on which they will annotate the data should be planned after taking into account all the requirement of the medical professionals who will use the software.
- Software Development team: They should know all the processes involved as well as the requirement from both Data Science team and the Medical Professionals so as to make sure the tagging engine has been developing according to their needs.
Quality Control: Ensuring high quality of the tagged data is most important aspect of data tagging process. The AI you will train will only be as accurate as the data that it is fed with. Many levels of quality control measures should be in place such as double verification of each tag, privilege level of the medical professionals etc. Subsequently, the software design and development need to align with all the Quality Control measures.
Technology: Developing a cloud-based user-friendly tagging engine is necessary for faster and scalable operations. It also ensures easy coordination among all the stakeholders. Creating a tagging engine is complex, since it has various stakeholders with different requirements altogether and requires processing of terabytes of data. Hence, it requires a well thought of database and storage architecture.
At ParallelDots, we have built a scalable, accurate, and faster tagging engine. For developing products powered by AI, fine control over data acquisition, processing and deployment is absolutely necessary. In this post, we gave you an overview of the important aspects of data tagging capacity. In our next few posts of this series, we will discuss each of them in detail. Watch this space for more.
ParallelDots AI APIs, is a Deep Learning powered web service by ParallelDots Inc, that can comprehend a huge amount of unstructured text and visual content to empower your products. You can check out some of our text analysis APIs and reach out to us by filling this form here or write to us at email@example.com.