-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[deprecated - too many problems w dataset] Kylel/semeval2017 #16
base: master
Are you sure you want to change the base?
Conversation
…generate NER data for conll2003
…bigger than mention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Otherwise looks good.
spacy_text = nlp(text) | ||
|
||
# split sentences & tokenize | ||
sent_token_spans = TokenSpan.find_sent_token_spans(text=text, sent_tokens=[[token.text for sent in spacy_text.sents for token in sent if token.text.strip() != '']]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BERT will probably complain about long sequences (entire abstract). So we need to do this at sentence level.
sci_bert/common/span.py
Outdated
return spans | ||
|
||
@classmethod | ||
def find_sent_token_spans(cls, text: str, sent_tokens: List[List[str]]) -> List[List['TokenSpan']]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems duplicate of the first funciton.
sci_bert/common/span.py
Outdated
temp = [] | ||
index_sent += 1 | ||
|
||
# append remaining mentions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment that this is the last sentence only.
sci_bert/common/span.py
Outdated
assert len(sent_mention_spans) == len(sent_token_spans) | ||
return sent_mention_spans | ||
|
||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider removing unused code.
|
||
|
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can comment these out as they are not related to entities.
…be too long for bert
I have trained the model with the semeval2017 data and evaluated it afterwards with semeval2017 evaluation script. I'am confused why I get so different results: Results reported by the semeval2017 evaluation script on test set: Material 0.44 0.43 0.44 904 avg / total 0.40 0.37 0.39 2051 So F1 is 0.39. Do you have maybe a reasonable explanation for this difference? |
hey @arthurbra thanks for your interest! i've been looking into this difference as well, and it looks like the task definitions are different between what we've implemented here & the original semeval2017 task. specifically, the 3 tasks in semeval2017 are (1) entity identification, (2) entity type classification, and (3) relation extraction. what we've implemented here combines both (1) and (2) into a single task (we're both extraction & tagging w/ the type at the same time). this will affect how f1 scoring is performed. instead of using the sequence tagging model we have here, my plan is to adapt an existing model made for the semeval2017 task, and substitute in the various bert variants to replace glove/w2v embeddings. |
Hey @kyleclo thanks for your reply and explanation. I am really impressed by your results and want to learn more. It is right, you perform entity identification and classification in one task. In my understanding this is task B in semeval 2017: In calculateMeasures of semeval2017 evaluation script you can pass the parameter remove_anno="rel" which should ignore relations during evaluation (as far as I understand the code). I already used this parameter in the evaluation of my prev. post. So I assume there should be an another explanation. It would be great if you could apply the code of the semeval 2017 winner with BERT. Unfortunatelly I was not able to find it (AI2 system of Ammar et. al.). |
Hi Kyle, I think I have found the problem: in the prediction script model.eval() is not called, so that dropout is active during prediction. Regards |
just the NER portion