-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathpreface.tex
60 lines (54 loc) · 3.74 KB
/
preface.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
\prefacesection{Abstract}
The increasing availability of large text corpora holds the promise of acquiring an
unprecedented amount of knowledge from this text.
However, current techniques are either specialized to particular domains or do not
scale to large corpora.
This dissertation develops a new technique for learning open-domain knowledge from
unstructured web-scale text corpora.
A first application aims to capture common sense facts: given a candidate statement
about the world and a large corpus of known facts, is the statement likely to be true?
We appeal to a probabilistic relaxation of natural logic -- a logic which uses the syntax of
natural language as its logical formalism -- to define a search problem from the query
statement to its appropriate support in the knowledge base over valid (or approximately
valid) logical inference steps.
We show a 4x improvement in recall over lemmatized lookup for querying common sense facts,
while maintaining above 90\% precision.
This approach is extended to handle longer, more complex premises by segmenting these
utterances into a set of atomic statements entailed through natural logic.
We evaluate this system in isolation by using it as the main component in an Open Information
Extraction system, and show that it achieves a 3\% absolute improvement in F$_1$ compared to
prior work on a competitive knowledge base population task.
A remaining challenge is elegantly handling cases where we could not find a supporting
premise for our query.
To address this, we create an analogue of an evaluation function in gameplaying search:
a shallow lexical classifier is folded into the search program to serve as a heuristic
function to assess how likely we would have been to find a premise.
Results on answering 4\nth grade science questions show that this method improves over
both the classifier in isolation and a strong IR baseline, and outperforms prior work
on the task.
%This dissertation describes a new technique for learning open-domain knowledge from unstructured web-scale text corpora,
% making use of a probabilistic relaxation of natural logic --
% a logic which uses the syntax of natural language as its logical formalism.
%We begin by reviewing the theory behind natural logic, and propose a novel extension of the logic to handle
% propositional formulae.
%
%We then show how to capture common sense facts: given a candidate statement about the world and a large corpus of
% known facts, is the statement likely to be true?
%This is treated as a search problem
% from the query statement to its appropriate support in the knowledge base over valid (or approximately valid)
% natural logical inference steps.
%This approach achieves a 4x improvement at retrieval recall compared to lemmatized lookup,
% maintaining above 90\% precision.
%
%We then extend the approach to handle longer, more complex premises by segmenting these utterance into a set of
% atomic statements entailed through natural logic.
%We evaluate this system in isolation by using it as the main component in an Open Information Extraction system,
% and show that it achieves a 3\% absolute improvement in F1 compared to prior work on a competitive knowledge
% base population task.
%
%Finally, we address how to elegantly handle situations where we could not find a supporting premise for our query.
%To address this, we create an analogue of an evaluation function in gameplaying search: a shallow lexical
% classifier is folded into the search program to serve as a heuristic function to assess how likely we would
% have been to find a premise.
%Results on answering 4th grade science questions show that this method improves over both the classifier in isolation,
% a strong IR baseline, and prior work.