The Impact of Word Representations on Sequential Neural MWE Identification(Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati, 2019)

논문 리뷰/MultiWordExpression

The Impact of Word Representations on Sequential Neural MWE Identification(Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati, 2019)

김아다만티움 2021. 8. 7. 15:48

Nicolas Zampieri, Carlos Ramisch, Geraldine Damnati. The Impact of Word Representations on Sequential Neural MWE Identification. Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Aug 2019, Florence, Italy. pp.169 - 175, ff10.18653/v1/W19-5121f

<선행연구>

1. finding MWEs in running text(Constant,2017)

2. PRSEME 1.1(Ramisch et al. 2018)

3. FastText(character n-gram, Bojanowski et al. 2017)

4. ‘Character-based embeddings have been shown useful to predict MWE compositionality out of text(Hakimi Parizi and Cook, 2018)’

<연구대상>

●verbal MWE(VMWE) identification

-lemmas vs surface forms

-traditional word embedding vs subword representation

●대상 언어: French, Polish, Basque(morphological: Basque)

<실험방법>

1. 사용 말뭉치

● PARSEME shared Task 1.1 VMWEs-annotated corpora

-Basque: 117000 tokens, morphological richness(2.32)

-French: 420000 tokens, discontinuous VMEs high(42.12%)

-Polish: 220000 tokens

2. 실험 architecture

●Veyn: sequence tagginf using RNN

-concatenate embedding of the words’ feature(lemmas, POS)....

-OUT: CRF lyrs

-tagging 은 BIOG+cat 형태

-trained by using shared task training corpora

-val: dev corpus사용

●임베딩: surface form, lemmas 두 타입으로 임베딩

-w2v. FTxt 사용

●contextual X: ELMo, BERT_지원 트랙이 달랐음

●Evaluation metrics

-MWE-based measure: F1 score for fully predicted VMWEs

-token-based measure: F! scoure for tokens belonging to a VMWEs.

<결론>

●word2Vec: MWE의 boundary는 잘 찾지 못하나, parts는 잘 찾음, single token 찾기 뛰어남

●metric 점수 면에선 FastText가 더 나은 결과, expression 자체를 잘 찾아냄

● morphological 할수록 lemmas가 도움, morphological에 가장 성능이 좋은건 form+lemmas

● 결론적으로 subword represenation이 MWE찾기에 도움, morphological 할수록 lemmas+forms