-
The Impact of Word Representations on Sequential Neural MWE Identification(Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati, 2019)논문 리뷰/MultiWordExpression 2021. 8. 7. 15:48
Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati. The Impact of Word Representations on Sequential Neural MWE Identification. Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Aug 2019, Florence, Italy. pp.169 - 175, ff10.18653/v1/W19-5121f
<선행연구>
1. finding MWEs in running text(Constant,2017)
2. PRSEME 1.1(Ramisch et al. 2018)
3. FastText(character n-gram, Bojanowski et al. 2017)
4. ‘Character-based embeddings have been shown useful to predict MWE compositionality out of text(Hakimi Parizi and Cook, 2018)’
<연구대상>
●verbal MWE(VMWE) identification
-lemmas vs surface forms
-traditional word embedding vs subword representation
●대상 언어: French, Polish, Basque(morphological: Basque)
<실험방법>
1. 사용 말뭉치
● PARSEME shared Task 1.1 VMWEs-annotated corpora
-Basque: 117000 tokens, morphological richness(2.32)
-French: 420000 tokens, discontinuous VMEs high(42.12%)
-Polish: 220000 tokens
2. 실험 architecture
●Veyn: sequence tagginf using RNN
-concatenate embedding of the words’ feature(lemmas, POS)....
-OUT: CRF lyrs
-tagging 은 BIOG+cat 형태
-trained by using shared task training corpora
-val: dev corpus사용
●임베딩: surface form, lemmas 두 타입으로 임베딩
-w2v. FTxt 사용
●contextual X: ELMo, BERT_지원 트랙이 달랐음
●Evaluation metrics
-MWE-based measure: F1 score for fully predicted VMWEs
-token-based measure: F! scoure for tokens belonging to a VMWEs.
<결론>
●word2Vec: MWE의 boundary는 잘 찾지 못하나, parts는 잘 찾음, single token 찾기 뛰어남
●metric 점수 면에선 FastText가 더 나은 결과, expression 자체를 잘 찾아냄
● morphological 할수록 lemmas가 도움, morphological에 가장 성능이 좋은건 form+lemmas
● 결론적으로 subword represenation이 MWE찾기에 도움, morphological 할수록 lemmas+forms