전체 글
-
초록) CLEAN-EVAL: Clean Evaluation on Contaminated Large Language ModelsGenerative AI/benchmarks 2023. 11. 20. 02:44
We are currently in an era of fierce competition among various large language models (LLMs) continuously pushing the boundaries of benchmark performance. However, genuinely assessing the capabilities of these LLMs has become a challenging and critical issue due to potential data contamination, and it wastes dozens of time and effor for researchers and engineers to download and try those contamin..
-
초록) Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation TasksGenerative AI/benchmarks 2023. 11. 20. 01:48
Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained w..
-
초록) Do Localization Methods Actually Localize Memorized Data in LLMs?Generative AI/benchmarks 2023. 11. 20. 01:06
Large language models (LLMs) can memorize many pretrained sequences vebatim. This paper studies if we can locate a small set of neurons in LLms reponsible for meorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Be..
-
CoLA) Neural Network Acceptability JudgmentsNLP Evaluation/Benchmarks 2023. 11. 12. 18:36
CoLA: Corpus of Linguistic Acceptability - 10,657개 영어 문장: 여러 언어학 문헌(Ex. 문법책)에서 발췌 & grammatical / ungrammatical로 라벨링 1. Introduction ●“Acceptability judgements”: 사람의 문법적 지식을 관찰하기 위한 생성문법학자들의 가장 기본적이고 행동적인 측정 방법(Chomsky 1957; Schuetze, 1996) ● 신경망에서의 acceptability judgements: 신경망이 **문법적 개념을 익혔는지**를 판단, 이때 문법적 개념은 **인간의 언어적 능력 측면**에서의 개념) 2. Acceptability Judgements 2.1. In Linguistics ● 주로 생성문법(g..
-
초록) Probing LLMs for Joint Encoding of Linguistic Categories논문 리뷰/초록 찍먹 2023. 11. 12. 18:30
Large Language Models (LLMs) exhibit impressive performance on a range of NLP tasks, due to the general-purpose linguistic knowledge acquired during pretraining. Existing model interpretability research (tenney et al., 2019) suggests that a linguistic hierarchy emerges in the LLM layers, with lower layers better suited to solving syntactic tasks and higher layers employed for semantic processing..
-
시편 23장참을인*3 2023. 11. 4. 14:42
[다윗의 시] 1 여호화는 나의 목자시니 내게 부족함이 없으리로다 2 그가 나를 푸른 풀밭에 누이시며 쉴 만한 물가로 인도하시는도다 3 내 영혼을 소생시키시고 자기 이름을 위하여 의의 길로 인도하시는도다 4 내가 사망한 음침한 골짜기로 다닐지라도 해를 두려워하지 않을 것은 주께서 나와 함께 하심이라 주의 지팡이와 막대기가 나를 안위하시나이다 5 주께서 내 원수의 목전에서 내게 상을 차려 주시고 기름을 내 머리에 부으셨으니 내 잔이 넘치나이다 6 내평생에 선하심과 인자하심이 반드시 나를 따르리니 내가 여호와의 집에 영원히 살리로다 * 내가 원수였으면 5절 너무 킹받았을듯... :(
-
Through the Lens of Core Competency: Survey on Evaluation of Large Language ModelsGenerative AI/benchmarks 2023. 11. 4. 14:07
Abstract From pre-trained language model(PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of improvement. However, LLMs are extremely hard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inadequate due to the e..