Big Data Annotation Project

 

Adequately labeled training data is essential for a successful machine learning project.  However, data often comes to researchers unlabeled and there is a need to develop tools that can label data in a coordinated and effective fashion.

The team at the Nell Hodgson Woodruff School of Nursing Center for Data Science, led by Dr. Xiao Hu and Del Bold, developed a web platform to host individual data annotation projects. Individual annotation projects are able to share common underneath data structures and a consistent graphic user interface so that they can be launched quickly. AWS at Emory enabled them to deploy an internet-facing web application to support recruiting data annotators from outside of Emory to use the tool. 

As an example, the team hosts a data annotation project to annotate cardiac arrhythmia using electrocardiogram and photoplethysmography signals. Using the platform, the team can engage multiple clinical experts, from any location, to annotate tens of thousands of signal strips and adjudicate results in a group fashion. In addition, they can readily expand the data to be annotated if such a need arises. 

As another example, the platform is used to host an annotation project where trigger words of various clinical concepts are highlighted and annotated by clinical experts to develop a machine learning model that can tokenize a clinical note with properly tagged concepts. This annotation project is also able to leverage built-in features of the web platform to support multiple annotators and group-level adjudications. Efforts are underway to expand the platform to support image annotation as well as multimodality annotations of more complex clinical events. 

The team at the Center for Data Science welcomes research studies to take advantage of their big data annotation/labeling tool and are ready to engage in collaboration studies to expand the use case of the application. Feel free to contact Dr. Xiao Hu​​​​​​​ or Del Bold for a demo and to learn more about how they are leveraging AWS at Emory to host their web application. If you are interested in hosting your own web application on AWS at Emory, please start by following the instructions on this wiki page

To learn about other Faculty use cases, click here