Distant supervision for relation extraction from text
Abstract
"Relation Extraction(RE) is an important part of Information Extraction(IE) which helps to extract facts from unstructured textual data. Supervised relation extraction is challenged by domain dependence and high labeling cost. Also supervised approaches are not scalable to the huge amount of textual data currently available
on the web. Challenged by the said issues, there is an evolving trend of using alternativeapproaches:semi-supervised approaches, distant supervision.
Distant supervision automatically labels a corpus using freely available knowledge bases.The intuition is that if there is a relation between two entities in the knowledge base then the sentence containing both these entities would indicate the relation between them. However, there are two issues with distant supervision. First, not all sentences containing two entities express the relation between
them. This results into false positives as sentence is labeled with the relation which actually it does not express. Second, some or all relations between entities may be missing from the knowledge base. So the sentence would be labeled with no relation in spite of expressing some relation. This would increase false negatives.
Most of the recent works have neglected the issue of false negatives. Some have addressed it concurrently with the issue of false positives. Since both the issues as already stated are independent, we believe that dealing with them independently should improve the performance. We propose a strategy where we first address false positives and then false negatives. Our results using this intuition shows improvement in terms of precision and recall. We also introduce some improvements in feature set for relation extraction which resulted in noticeable gains."
Collections
- M Tech Dissertations [923]