Abstract:
Presently, medical professionals and researchers face a dire problem trying to identify important and subject specific documents for medical research. This is mainly owin...Show MoreMetadata
Abstract:
Presently, medical professionals and researchers face a dire problem trying to identify important and subject specific documents for medical research. This is mainly owing to the fact that there is a disconnection in the pipeline for finding essential documents via a common platform which can parse and link the complex medical terminologies. To solve this problem, a model is generated, which creates a Semi-automated ontology and Knowledge-graph for link prediction using unstructured medical documents. To extract entities from a document is a tiresome task but can be achieved using multiple resources like DBPedia, pretrained statistical models on English provided by spaCy. Though, for biomedical data above two are not enough, as another key challenge with biomedical data is with its commonly occurring abbreviated names and noun compounds containing punctuation, which might lead to misidentification. To overcome this problem and also to train this model more medical specific, [bionlp13cg] model provided by scispaCy is used. It is a spaCy NER model trained on the BIONLP13CG corpus. This model not only extracts entities which are more specific to the medical domain, but also label them according to the category they belong to. All the entities are classified into subclass and main-class using ontologies. Hence link is predicted between two entities according to the categories they belong to. Ontology is created using Medical Subject Headings (MeSH) RDF. It is a linked data representation of the MeSH biomedical vocabulary produced by the National Library of Medicine, thereby it provides precise results. Since, some entities are not identified by Mesh, they are first categorized using labels obtained by scispaCy, if it is recognized by latter. Further these categories are mapped to MeSH Ontology to enhance the precision of linked entities. Since this model is graph enabled, it gives users a very specific relation between medical terminologies.
Date of Conference: 10-13 December 2020
Date Added to IEEE Xplore: 05 February 2021
ISBN Information: