蜗牛分享 http://blog.sciencenet.cn/u/babyann519

博文

Deviation of Zipf’s and Heaps’ Laws in Human Languages

已有 4672 次阅读 2013-2-25 14:22 |个人分类:科研工作|系统分类:论文交流| Zipf定律, Heaps定律, 语言系统, 选择动力学

Deviation of Zipf’s and Heaps’ Laws in Human Languages with Limited Dictionary Sizes

Zipf’s law on word frequency and Heaps’ law on the growth of distinct words are observed in Indo-European
language family, but it does not hold for languages like Chinese, Japanese and Korean. These languages
consist of characters, and are of very limited dictionary sizes. Extensive experiments show that: (i) The
character frequency distribution follows a power law with exponent close to one, at which the corresponding
Zipf’s exponent diverges. Indeed, the character frequency decays exponentially in the Zipf’s plot. (ii) The
number of distinct characters grows with the text length in three stages: It grows linearly in the beginning,
then turns to a logarithmical form, and eventually saturates. A theoretical model for writing process is
proposed, which embodies the rich-get-richer mechanism and the effects of limited dictionary size.
Experiments, simulations and analytical solutions agree well with each other. This work refines the
understanding about Zipf’s and Heaps’ laws in human language systems.

Citation:Linyuan Lu,Zi-Ke Zhang, Tao Zhou,Deviation of Zipf’s and Heaps’ Laws in Human Languages with Limited Dictionary Sizes,Scientific Reports 3,1082 (2013).

Download:srep01082.pdf

相关博文:http://blog.sciencenet.cn/home.php?mod=space&uid=3075&do=blog&id=659858

最近有一文,较详细讨论了Heaps和Zipf之关系,感兴趣者可供参考~

题目:Power-Law Connections: From Zipf to Heaps and Beyond
作者:Iddo I. Eliazar and Morrel H. Cohen

Abstract
In this paper we explore the asymptotic statistics of a general model of rank distribution in the large-ensemble limit; the construction of the general model is motivated by recent empirical studies of rank distributions. Applying Lorenzian, oligarchic, and Heapsian asymptotic analyses we establish a comprehensive set of closed-form results linking together rank distributions, probability distributions, oligarchy sizes, and innovation rates. In particular, the general results reveal the fundamental underlying connections between Zipf’s law, Pareto’s law, and Heaps’ law– three elemental empirical power-laws that are ubiquitously observed in the sciences.

Keywords: rank distributions; power-laws; Zipf’s law; Pareto’s law; Heaps’ law; Lorenz curves; the distribution of wealth; oligarchy sizes; innovation rates; phase transitions; self-organized criticality.

PACS: 02.50.-r (Probability theory, stochastic processes, and statistics);
89.65.-s (Social and economic systems).


https://blog.sciencenet.cn/blog-329471-664970.html

上一篇:Potential Theory for Directed Networks
下一篇:第14届特伦托暑期班-Modularity and Design for Innovation
收藏 IP: 122.234.50.*| 热度|

2 李伟钢 杨正瓴

该博文允许注册用户评论 请点击登录 评论 (1 个评论)

数据加载中...
扫一扫,分享此博文

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-5-18 02:38

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部