協氏定律

協氏定律(英文:Heaps' law,又計做英文:Herdan's law),係語言學上一條靠實證[註 1]得出嘅定律。根據協氏定律,以下呢條式實會成立[1]:
- ,當中
- 指一份 隻字咁長嘅文件入面有幾多隻唔同款嘅字-「am am」係兩隻同款嘅字,「I am」係兩隻唔同款嘅字,
- 同 係某啲參數,數值視乎語言而定。喺英文入面, 數值通常會係 10 至 100,而 數值係 0.4 至 0.6。
將協氏定律條式畫做圖嘅話,會出好似附圖噉嘅線。
用日常用語講嘅話,協氏定律講嘅嘢如下:
「 | 一份文件嘅長度愈長,愈難搵到新鮮嘅字。
|
」 |
註釋[編輯]
睇埋[編輯]
文獻[編輯]
- Egghe, L. (2007), "Untangling Herdan's law and Heaps' law: Mathematical and informetric arguments", Journal of the American Society for Information Science and Technology, 58 (5): 702-709.
攷[編輯]
- ↑ Heaps, Harold Stanley (1978), Information Retrieval: Computational and Theoretical Aspects, Academic Press. Heaps' law is proposed in Section 7.5 (pp. 206-208).