博文

Discovering Shannon\'s Entropy: As an Infant（初学者版）

已有 658 次阅读 2024-10-18 11:20 |系统分类:论文交流

Discovering Shannon's Entropy: As an Infant

Yucong Duan

International Standardization Committee of Networked DIKWP for Artificial Intelligence Evaluation(DIKWP-SC)

World Artificial Consciousness CIC(WAC)

World Conference on Artificial Consciousness(WCAC)

(Email: duanyucong@hotmail.com)

Introduction

From the earliest moments of my life, I was surrounded by a world rich in sounds, sights, and sensations. As I began to interact with my environment, I noticed patterns, redundancies, and surprises in the information I received. My curiosity drove me to understand how information is conveyed, how uncertainty plays a role, and how to quantify the amount of information in a message. Through observation, experimentation, and logical reasoning, I embarked on a journey that would lead me to discover Shannon's Entropy Formula, a foundational concept in information theory.

In this narrative, I will detail how, starting from basic sensory experiences as an infant, I independently observed, experimented, and logically deduced Shannon's entropy formula. Each concept evolved explicitly from my experiences, ensuring that my understanding is grounded in reality and free from subjective definitions.

Chapter 1: Observing Patterns in Communication1.1 Early Interactions with Sounds and GesturesExperiencing Repetition and Novelty

Observation: My caregivers often used certain words and gestures repeatedly, such as "milk," "sleep," and "play."
Reflection: Some sounds and actions are more predictable than others.
Semantics: Predictable events carry less surprise; unpredictable events carry more surprise.

Responding to Unexpected Stimuli

Observation: A sudden loud noise captured my attention more than familiar sounds.
Reflection: Unexpected or rare events convey more new information.

1.2 Recognizing Redundancy in InformationRepeated Messages

Observation: When my caregivers wanted to emphasize something, they repeated it multiple times.
Reflection: Repetition makes the message more certain but adds less new information.

Semantics:

Redundancy: Repeating information reduces uncertainty but doesn't necessarily add new content.

Chapter 2: Understanding Probability and Uncertainty2.1 Learning from OutcomesPredicting Events

Experiment: Noticing that after a bath, I was often given a warm towel.
Observation: Certain events have a high likelihood of occurring after specific actions.
Reflection: I can predict some events better than others based on past experiences.

Surprise and Uncertainty

Observation: When something unexpected happened, like a new toy appearing, it was more exciting.
Semantics: Events with lower probability are more surprising and convey more information.

2.2 Quantifying UncertaintyAssigning Probabilities

Concept: I began to assign a sense of likelihood to events based on their frequency.
Example: If out of 10 times, I received a cookie 7 times after dinner, the probability is 0.7.

Understanding Certainty and Uncertainty

Certainty: Events with probability close to 1 are almost certain.
Uncertainty: Events with probability close to 0 are highly uncertain.

Chapter 3: Measuring Information Content3.1 Defining InformationInformation as Reduction of Uncertainty

Concept: Information reduces uncertainty about an event.
Semantics: The more uncertain an event, the more information it provides when it occurs.

Information Content and Probability

Hypothesis: The information content of an event is related to its probability.
Observation: Rare events provide more information upon occurrence.

3.2 Mathematical Representation of InformationDesirable Properties

Monotonicity: Information content should decrease as probability increases.
Additivity: Independent events should have additive information content.

Defining Information Content

Mathematical Expression:
$I(p) = -\log_b p$

$I (p)$ : Information content of an event with probability $p$
$log_b$ : Logarithm to base $b$
Semantics: Negative logarithm ensures that higher probabilities yield lower information content.

Choosing the Logarithm Base

Common Bases:

Base 2: Information measured in bits.
Base $e$ : Natural logarithm, information measured in nats.
Base 10: Information measured in digits.

Chapter 4: Introducing Entropy4.1 Understanding Entropy as Average InformationMultiple Possible Events

Scenario: Considering a set of possible messages or events, each with its own probability.
Example: Different words used by my caregivers have different frequencies.

Defining Entropy

Concept: Entropy is the average information content per event, considering all possible events.
Mathematical Expression:
$-\sum_{i=1}^n p_i \log_b p_i$

$H (X)$ : Entropy of the random variable $X$
$p_i$ : Probability of the $i$ -th event
$n$ : Total number of possible events

4.2 Properties of EntropyNon-Negativity

Observation: Entropy is always greater than or equal to zero.
Explanation: Since probabilities are between 0 and 1, and logarithms of numbers less than 1 are negative, the negative sign ensures positive entropy.

Maximum Entropy

Scenario: When all events are equally likely ( $pi=1np_i = \frac{1}{n}$ ), entropy is maximized.
Interpretation: Maximum uncertainty occurs when we have no preference or prediction for any event.

Chapter 5: Practical Applications of Entropy5.1 Efficient EncodingCommunicating Messages

Observation: Frequently used words or signals can be represented with shorter codes.
Reflection: Assigning shorter codes to more probable events reduces the average message length.

Huffman Coding

Concept: A method to create optimal prefix codes based on probabilities.
Semantics: Aligns with the principle that entropy represents the minimum average code length.

5.2 Channel CapacityUnderstanding Limitations

Observation: Noisy environments can distort messages.
Concept: There is a maximum rate at which information can be reliably transmitted over a channel.

Shannon's Channel Capacity Formula

Mathematical Expression:
$\log_2 \left(1 + \frac{S}{N}\right)$

$C$ : Channel capacity in bits per second
$B$ : Bandwidth of the channel in hertz
$S$ : Signal power
$N$ : Noise power

Chapter 6: Exploring Entropy in Different Contexts6.1 Joint Entropy and Conditional EntropyJoint Entropy

Concept: Measures the uncertainty of two random variables considered together.
Mathematical Expression:
$-\sum_{i,j} p_{i,j} \log_b p_{i,j}$

$p_{i,j}$ : Joint probability of events $X_i$ and $Y_j$

Conditional Entropy

Concept: Measures the remaining uncertainty of one variable given knowledge of another.
Mathematical Expression:
$H (Y ∣ X) = H (X, Y) - H (X)$

6.2 Mutual InformationDefining Mutual Information

Concept: Measures the amount of information one variable contains about another.
Mathematical Expression:
$I (X; Y) = H (X) - H (X ∣ Y)$
Interpretation: Reduction in uncertainty of $X$ due to knowledge of $Y$ .

Chapter 7: The Kullback-Leibler Divergence7.1 Measuring Difference Between DistributionsConceptualizing Divergence

Observation: Comparing expected outcomes with actual outcomes reveals discrepancies.
Semantics: The divergence quantifies the difference between two probability distributions.

Mathematical Expression:

$DKL(P∣∣Q)=∑ipilog⁡b(piqi)D_{\text{KL}}(P || Q) = \sum_{i} p_i \log_b \left( \frac{p_i}{q_i} \right)$

$P$ : True probability distribution
$Q$ : Estimated or approximate probability distribution

7.2 Applications of KL DivergenceModel Selection

Usage: Choosing the model that minimizes the divergence from the true distribution.
Interpretation: A model with lower KL divergence better represents the observed data.

Chapter 8: Reflecting on the Discovery8.1 The Significance of EntropyFundamental Measure

Understanding: Entropy quantifies the average uncertainty or information content.
Implications: It is a cornerstone in fields like information theory, thermodynamics, and statistical mechanics.

8.2 The Power of Mathematical FormalismUnified Description

Observation: Mathematical expressions provide a precise way to quantify abstract concepts.
Reflection: The entropy formula elegantly captures the essence of information content.

Chapter 9: Applications and Implications9.1 Data CompressionLossless Compression

Principle: Compress data without losing any information, achievable up to the entropy limit.
Example: ZIP files, PNG images.

Lossy Compression

Principle: Discard less important information to achieve higher compression ratios.
Example: JPEG images, MP3 audio.

9.2 CryptographyInformation Security

Observation: Entropy measures the unpredictability of keys in encryption algorithms.
Implication: Higher entropy keys are more secure against brute-force attacks.

9.3 Machine LearningDecision Trees

Usage: Entropy is used to decide which feature to split on by measuring information gain.
Information Gain:
$Gain(X,Y)=H(Y)−H(Y∣X)\text{Gain}(X, Y) = H(Y) - H(Y|X)$

Conclusion

Through observation, experimentation, and logical reasoning, I was able to discover and formulate Shannon's Entropy Formula. Starting from basic experiences with communication, patterns, and probabilities, I developed the concepts of information content, uncertainty, and entropy. By grounding each concept in reality and evolving the semantics explicitly, I arrived at a fundamental principle that quantifies the average information in a message.

This exploration demonstrates that complex scientific concepts can emerge naturally from simple observations. By avoiding subjective definitions and relying on direct experiences, profound ideas become accessible and meaningful. Shannon's entropy formula not only provides insight into the nature of information but also underpins modern digital communication, data compression, and information theory.

Epilogue: Implications for Learning and AI

This narrative illustrates how foundational scientific principles can be understood through direct interaction with the environment and logical reasoning. In the context of artificial intelligence and cognitive development, it emphasizes the importance of experiential learning and the evolution of semantics from core experiences.

By enabling AI systems to observe patterns, quantify uncertainties, and derive laws from observations, we can foster the development of intuitive understanding similar to human learning. This approach avoids reliance on predefined definitions and promotes the natural discovery of scientific relationships.

Note: This detailed narrative presents the conceptualization of Shannon's entropy formula as if I, an infant, independently observed and reasoned it out. Each concept is derived from basic experiences, emphasizing the natural progression from simple observations of communication and uncertainty to the understanding of entropy and information theory. This approach demonstrates that with curiosity and logical thinking, foundational knowledge about complex concepts can be accessed and understood without relying on subjective definitions.

References

International Standardization Committee of Networked DIKWP for Artificial Intelligence Evaluation (DIKWP-SC),World Association of Artificial Consciousness(WAC),World Conference on Artificial Consciousness(WCAC). Standardization of DIKWP Semantic Mathematics of International Test and Evaluation Standards for Artificial Intelligence based on Networked Data-Information-Knowledge-Wisdom-Purpose (DIKWP ) Model. October 2024 DOI: 10.13140/RG.2.2.26233.89445 . https://www.researchgate.net/publication/384637381_Standardization_of_DIKWP_Semantic_Mathematics_of_International_Test_and_Evaluation_Standards_for_Artificial_Intelligence_based_on_Networked_Data-Information-Knowledge-Wisdom-Purpose_DIKWP_Model
Duan, Y. (2023). The Paradox of Mathematics in AI Semantics. Proposed by Prof. Yucong Duan:" As Prof. Yucong Duan proposed the Paradox of Mathematics as that current mathematics will not reach the goal of supporting real AI development since it goes with the routine of based on abstraction of real semantics but want to reach the reality of semantics. ".

转载本文请联系原作者获取授权，同时请注明本文来自段玉聪科学网博客。
链接地址：https://blog.sciencenet.cn/blog-3429562-1455883.html

上一篇：Discovering the Schrödinger Equation: As an Infant（初学者版）
下一篇：Discovering Pythagoras\' Theorem: As an Infant（初学者版）

收藏 IP: 140.240.43.*| 热度|

当前推荐数：1 推荐人：张忆文

该博文允许注册用户评论请点击登录评论 (0 个评论)

数据加载中...

返回顶部

段玉聪

扫一扫，分享此博文

YucongDuan的个人博客分享 http://blog.sciencenet.cn/u/YucongDuan

博文

Discovering Shannon\'s Entropy: As an Infant（初学者版）

当前推荐数：1 推荐人：张忆文

该博文允许注册用户评论请点击登录评论 (0 个评论)

段玉聪

全部作者的其他最新博文

全部精选博文导读

YucongDuan的个人博客分享 http://blog.sciencenet.cn/u/YucongDuan

博文

Discovering Shannon\'s Entropy: As an Infant（初学者版）

当前推荐数：1 推荐人： 张忆文

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

段玉聪

全部作者的其他最新博文

全部精选博文导读

当前推荐数：1 推荐人：张忆文

该博文允许注册用户评论请点击登录评论 (0 个评论)