We extend the natural language processing abilities of Snorkel, a system for rapidly creating training data, to include cross-
sentence n-ary relation extraction. A vast majority of the work on relation extraction, the task of extracting
semantic relationships between two or more entities, has concentrated on single sentences. Therefore, we use a drug-disease
causation dataset with labels for multi-sentence relations to test how a weakly supervised model
could perform against one trained with hand-made labels. We propose a novel heuristic that looks for the keywords within a custom multi-
sentence dependency tree. By searching along the path defined by the syntactic relationships (e.g., subject,
determinant, classifier) between the drug and disease our model is able to perform as well as models trained with hand-labeled data.
|