The Impact of Word Representations on Sequential Neural MWE Identification(Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati, 2019)
Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati. The Impact of Word Representations on Sequential Neural MWE Identification. Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Aug 2019, Florence, Italy. pp.169 - 175, ff10.18653/v1/W19-5121f
<선행연구>
1. finding MWEs in running text(Constant,2017)
2. PRSEME 1.1(Ramisch et al. 2018)
3. FastText(character n-gram, Bojanowski et al. 2017)
4. ‘Character-based embeddings have been shown useful to predict MWE compositionality out of text(Hakimi Parizi and Cook, 2018)’
<연구대상>
●verbal MWE(VMWE) identification
-lemmas vs surface forms
-traditional word embedding vs subword representation
●대상 언어: French, Polish, Basque(morphological: Basque)
<실험방법>
1. 사용 말뭉치
● PARSEME shared Task 1.1 VMWEs-annotated corpora
-Basque: 117000 tokens, morphological richness(2.32)
-French: 420000 tokens, discontinuous VMEs high(42.12%)
-Polish: 220000 tokens
2. 실험 architecture
●Veyn: sequence tagginf using RNN
-concatenate embedding of the words’ feature(lemmas, POS)....
-OUT: CRF lyrs
-tagging 은 BIOG+cat 형태
-trained by using shared task training corpora
-val: dev corpus사용
●임베딩: surface form, lemmas 두 타입으로 임베딩
-w2v. FTxt 사용
●contextual X: ELMo, BERT_지원 트랙이 달랐음
●Evaluation metrics
-MWE-based measure: F1 score for fully predicted VMWEs
-token-based measure: F! scoure for tokens belonging to a VMWEs.
<결론>
●word2Vec: MWE의 boundary는 잘 찾지 못하나, parts는 잘 찾음, single token 찾기 뛰어남
●metric 점수 면에선 FastText가 더 나은 결과, expression 자체를 잘 찾아냄
● morphological 할수록 lemmas가 도움, morphological에 가장 성능이 좋은건 form+lemmas
● 결론적으로 subword represenation이 MWE찾기에 도움, morphological 할수록 lemmas+forms