Generative AI/benchmarks
-
leveraging large language models for NLG Evaluation: A surveyGenerative AI/benchmarks 2024. 1. 22. 15:32
Li, Z., Xu, X., Shen, T., Xu, C., Gu, J. C., & Tao, C. (2024). Leveraging Large Language Models for NLG Evaluation: A Survey. arXiv preprint arXiv:2401.07103. 1. Introduction ㅇ imperative to establish robust evaluation methodologies that can reliably gauge the quality of the generated content ㅇ shortage of Traditional NLG evaluation metrics - BLEU, ROUGE, TER.. they only focus surface-level text..
-
Can Large Language Models Understand Real-World Complex Instructions?Generative AI/benchmarks 2023. 11. 30. 02:54
* He, Q., Zeng, J., Huang, W., Chen, L., Xiao, J., He, Q., ... & Xiao, Y. (2023). Can Large Language Models Understand Real-World Complex Instructions?. arXiv preprint arXiv:2309.09150. (AAAI accepted) * 주안점 - 대부분은 English model들이 인스트럭션 이해를 잘함 - 한편 Chinese data가 Chinese model 성능을 높임 - 모델 34개 정도는 실험해야 accept이 되는구나... ㅇ LLM들의 발전은 눈부시나 complex instruction을 이해하는데 한계 - what is complex instruction? 1)..
-
초록) CLEAN-EVAL: Clean Evaluation on Contaminated Large Language ModelsGenerative AI/benchmarks 2023. 11. 20. 02:44
We are currently in an era of fierce competition among various large language models (LLMs) continuously pushing the boundaries of benchmark performance. However, genuinely assessing the capabilities of these LLMs has become a challenging and critical issue due to potential data contamination, and it wastes dozens of time and effor for researchers and engineers to download and try those contamin..
-
초록) Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation TasksGenerative AI/benchmarks 2023. 11. 20. 01:48
Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained w..
-
초록) Do Localization Methods Actually Localize Memorized Data in LLMs?Generative AI/benchmarks 2023. 11. 20. 01:06
Large language models (LLMs) can memorize many pretrained sequences vebatim. This paper studies if we can locate a small set of neurons in LLms reponsible for meorizing a given sequence. While the concept of localization is often mentioned in prior work, methods for localization have never been systematically and directly evaluated; we address this with two benchmarking approaches. In our INJ Be..
-
Through the Lens of Core Competency: Survey on Evaluation of Large Language ModelsGenerative AI/benchmarks 2023. 11. 4. 14:07
Abstract From pre-trained language model(PLM) to large language model (LLM), the field of natural language processing (NLP) has witnessed steep performance gains and wide practical uses. The evaluation of a research field guides its direction of improvement. However, LLMs are extremely hard to thoroughly evaluate for two reasons. First of all, traditional NLP tasks become inadequate due to the e..