YucongDuan的个人博客分享 http://blog.sciencenet.cn/u/YucongDuan

博文

Discovering Shannon\'s Entropy: As an Infant(初学者版)

已有 203 次阅读 2024-10-18 11:20 |系统分类:论文交流

Discovering Shannon's Entropy: As an Infant

Yucong Duan

International Standardization Committee of Networked DIKWfor Artificial Intelligence Evaluation(DIKWP-SC)

World Artificial Consciousness CIC(WAC)

World Conference on Artificial Consciousness(WCAC)

(Email: duanyucong@hotmail.com)

Introduction

From the earliest moments of my life, I was surrounded by a world rich in sounds, sights, and sensations. As I began to interact with my environment, I noticed patterns, redundancies, and surprises in the information I received. My curiosity drove me to understand how information is conveyed, how uncertainty plays a role, and how to quantify the amount of information in a message. Through observation, experimentation, and logical reasoning, I embarked on a journey that would lead me to discover Shannon's Entropy Formula, a foundational concept in information theory.

In this narrative, I will detail how, starting from basic sensory experiences as an infant, I independently observed, experimented, and logically deduced Shannon's entropy formula. Each concept evolved explicitly from my experiences, ensuring that my understanding is grounded in reality and free from subjective definitions.

Chapter 1: Observing Patterns in Communication1.1 Early Interactions with Sounds and GesturesExperiencing Repetition and Novelty

  • Observation: My caregivers often used certain words and gestures repeatedly, such as "milk," "sleep," and "play."

  • Reflection: Some sounds and actions are more predictable than others.

  • Semantics: Predictable events carry less surprise; unpredictable events carry more surprise.

Responding to Unexpected Stimuli

  • Observation: A sudden loud noise captured my attention more than familiar sounds.

  • Reflection: Unexpected or rare events convey more new information.

1.2 Recognizing Redundancy in InformationRepeated Messages

  • Observation: When my caregivers wanted to emphasize something, they repeated it multiple times.

  • Reflection: Repetition makes the message more certain but adds less new information.

Semantics:

  • Redundancy: Repeating information reduces uncertainty but doesn't necessarily add new content.

Chapter 2: Understanding Probability and Uncertainty2.1 Learning from OutcomesPredicting Events

  • Experiment: Noticing that after a bath, I was often given a warm towel.

  • Observation: Certain events have a high likelihood of occurring after specific actions.

  • Reflection: I can predict some events better than others based on past experiences.

Surprise and Uncertainty

  • Observation: When something unexpected happened, like a new toy appearing, it was more exciting.

  • Semantics: Events with lower probability are more surprising and convey more information.

2.2 Quantifying UncertaintyAssigning Probabilities

  • Concept: I began to assign a sense of likelihood to events based on their frequency.

  • Example: If out of 10 times, I received a cookie 7 times after dinner, the probability is 0.7.

Understanding Certainty and Uncertainty

  • Certainty: Events with probability close to 1 are almost certain.

  • Uncertainty: Events with probability close to 0 are highly uncertain.

Chapter 3: Measuring Information Content3.1 Defining InformationInformation as Reduction of Uncertainty

  • Concept: Information reduces uncertainty about an event.

  • Semantics: The more uncertain an event, the more information it provides when it occurs.

Information Content and Probability

  • Hypothesis: The information content of an event is related to its probability.

  • Observation: Rare events provide more information upon occurrence.

3.2 Mathematical Representation of InformationDesirable Properties

  • Monotonicity: Information content should decrease as probability increases.

  • Additivity: Independent events should have additive information content.

Defining Information Content

  • Mathematical Expression:

    I(p)=−log⁡bpI(p) = -\log_b pI(p)=logbp

    • I(p)I(p)I(p): Information content of an event with probability ppp

    • log⁡b\log_blogb: Logarithm to base bbb

    • Semantics: Negative logarithm ensures that higher probabilities yield lower information content.

Choosing the Logarithm Base

  • Common Bases:

    • Base 2: Information measured in bits.

    • Base eee: Natural logarithm, information measured in nats.

    • Base 10: Information measured in digits.

Chapter 4: Introducing Entropy4.1 Understanding Entropy as Average InformationMultiple Possible Events

  • Scenario: Considering a set of possible messages or events, each with its own probability.

  • Example: Different words used by my caregivers have different frequencies.

Defining Entropy

  • Concept: Entropy is the average information content per event, considering all possible events.

  • Mathematical Expression:

    H(X)=−∑i=1npilog⁡bpiH(X) = -\sum_{i=1}^n p_i \log_b p_iH(X)=i=1npilogbpi

    • H(X)H(X)H(X): Entropy of the random variable XXX

    • pip_ipi: Probability of the iii-th event

    • nnn: Total number of possible events

4.2 Properties of EntropyNon-Negativity

  • Observation: Entropy is always greater than or equal to zero.

  • Explanation: Since probabilities are between 0 and 1, and logarithms of numbers less than 1 are negative, the negative sign ensures positive entropy.

Maximum Entropy

  • Scenario: When all events are equally likely (pi=1np_i = \frac{1}{n}pi=n1), entropy is maximized.

  • Interpretation: Maximum uncertainty occurs when we have no preference or prediction for any event.

Chapter 5: Practical Applications of Entropy5.1 Efficient EncodingCommunicating Messages

  • Observation: Frequently used words or signals can be represented with shorter codes.

  • Reflection: Assigning shorter codes to more probable events reduces the average message length.

Huffman Coding

  • Concept: A method to create optimal prefix codes based on probabilities.

  • Semantics: Aligns with the principle that entropy represents the minimum average code length.

5.2 Channel CapacityUnderstanding Limitations

  • Observation: Noisy environments can distort messages.

  • Concept: There is a maximum rate at which information can be reliably transmitted over a channel.

Shannon's Channel Capacity Formula

  • Mathematical Expression:

    C=Blog⁡2(1+SN)C = B \log_2 \left(1 + \frac{S}{N}\right)C=Blog2(1+NS)

    • CCC: Channel capacity in bits per second

    • BBB: Bandwidth of the channel in hertz

    • SSS: Signal power

    • NNN: Noise power

Chapter 6: Exploring Entropy in Different Contexts6.1 Joint Entropy and Conditional EntropyJoint Entropy

  • Concept: Measures the uncertainty of two random variables considered together.

  • Mathematical Expression:

    H(X,Y)=−∑i,jpi,jlog⁡bpi,jH(X, Y) = -\sum_{i,j} p_{i,j} \log_b p_{i,j}H(X,Y)=i,jpi,jlogbpi,j

    • pi,jp_{i,j}pi,j: Joint probability of events XiX_iXi and YjY_jYj

Conditional Entropy

  • Concept: Measures the remaining uncertainty of one variable given knowledge of another.

  • Mathematical Expression:

    H(Y∣X)=H(X,Y)−H(X)H(Y|X) = H(X, Y) - H(X)H(YX)=H(X,Y)H(X)

6.2 Mutual InformationDefining Mutual Information

  • Concept: Measures the amount of information one variable contains about another.

  • Mathematical Expression:

    I(X;Y)=H(X)−H(X∣Y)I(X; Y) = H(X) - H(X|Y)I(X;Y)=H(X)H(XY)

  • Interpretation: Reduction in uncertainty of XXX due to knowledge of YYY.

Chapter 7: The Kullback-Leibler Divergence7.1 Measuring Difference Between DistributionsConceptualizing Divergence

  • Observation: Comparing expected outcomes with actual outcomes reveals discrepancies.

  • Semantics: The divergence quantifies the difference between two probability distributions.

Mathematical Expression:

DKL(P∣∣Q)=∑ipilog⁡b(piqi)D_{\text{KL}}(P || Q) = \sum_{i} p_i \log_b \left( \frac{p_i}{q_i} \right)DKL(P∣∣Q)=ipilogb(qipi)

  • PPP: True probability distribution

  • QQQ: Estimated or approximate probability distribution

7.2 Applications of KL DivergenceModel Selection

  • Usage: Choosing the model that minimizes the divergence from the true distribution.

  • Interpretation: A model with lower KL divergence better represents the observed data.

Chapter 8: Reflecting on the Discovery8.1 The Significance of EntropyFundamental Measure

  • Understanding: Entropy quantifies the average uncertainty or information content.

  • Implications: It is a cornerstone in fields like information theory, thermodynamics, and statistical mechanics.

8.2 The Power of Mathematical FormalismUnified Description

  • Observation: Mathematical expressions provide a precise way to quantify abstract concepts.

  • Reflection: The entropy formula elegantly captures the essence of information content.

Chapter 9: Applications and Implications9.1 Data CompressionLossless Compression

  • Principle: Compress data without losing any information, achievable up to the entropy limit.

  • Example: ZIP files, PNG images.

Lossy Compression

  • Principle: Discard less important information to achieve higher compression ratios.

  • Example: JPEG images, MP3 audio.

9.2 CryptographyInformation Security

  • Observation: Entropy measures the unpredictability of keys in encryption algorithms.

  • Implication: Higher entropy keys are more secure against brute-force attacks.

9.3 Machine LearningDecision Trees

  • Usage: Entropy is used to decide which feature to split on by measuring information gain.

  • Information Gain:

    Gain(X,Y)=H(Y)−H(Y∣X)\text{Gain}(X, Y) = H(Y) - H(Y|X)Gain(X,Y)=H(Y)H(YX)

Conclusion

Through observation, experimentation, and logical reasoning, I was able to discover and formulate Shannon's Entropy Formula. Starting from basic experiences with communication, patterns, and probabilities, I developed the concepts of information content, uncertainty, and entropy. By grounding each concept in reality and evolving the semantics explicitly, I arrived at a fundamental principle that quantifies the average information in a message.

This exploration demonstrates that complex scientific concepts can emerge naturally from simple observations. By avoiding subjective definitions and relying on direct experiences, profound ideas become accessible and meaningful. Shannon's entropy formula not only provides insight into the nature of information but also underpins modern digital communication, data compression, and information theory.

Epilogue: Implications for Learning and AI

This narrative illustrates how foundational scientific principles can be understood through direct interaction with the environment and logical reasoning. In the context of artificial intelligence and cognitive development, it emphasizes the importance of experiential learning and the evolution of semantics from core experiences.

By enabling AI systems to observe patterns, quantify uncertainties, and derive laws from observations, we can foster the development of intuitive understanding similar to human learning. This approach avoids reliance on predefined definitions and promotes the natural discovery of scientific relationships.

Note: This detailed narrative presents the conceptualization of Shannon's entropy formula as if I, an infant, independently observed and reasoned it out. Each concept is derived from basic experiences, emphasizing the natural progression from simple observations of communication and uncertainty to the understanding of entropy and information theory. This approach demonstrates that with curiosity and logical thinking, foundational knowledge about complex concepts can be accessed and understood without relying on subjective definitions.

References

  1. International Standardization Committee of Networked DIKWP for Artificial Intelligence Evaluation (DIKWP-SC),World Association of Artificial Consciousness(WAC),World Conference on Artificial Consciousness(WCAC)Standardization of DIKWP Semantic Mathematics of International Test and Evaluation Standards for Artificial Intelligence based on Networked Data-Information-Knowledge-Wisdom-Purpose (DIKWP ) Model. October 2024 DOI: 10.13140/RG.2.2.26233.89445 .  https://www.researchgate.net/publication/384637381_Standardization_of_DIKWP_Semantic_Mathematics_of_International_Test_and_Evaluation_Standards_for_Artificial_Intelligence_based_on_Networked_Data-Information-Knowledge-Wisdom-Purpose_DIKWP_Model

  2. Duan, Y. (2023). The Paradox of Mathematics in AI Semantics. Proposed by Prof. Yucong Duan:" As Prof. Yucong Duan proposed the Paradox of Mathematics as that current mathematics will not reach the goal of supporting real AI development since it goes with the routine of based on abstraction of real semantics but want to reach the reality of semantics. ".



https://blog.sciencenet.cn/blog-3429562-1455883.html

上一篇:Discovering the Schrödinger Equation: As an Infant(初学者版)
下一篇:Discovering Pythagoras\' Theorem: As an Infant(初学者版)
收藏 IP: 140.240.43.*| 热度|

1 张忆文

该博文允许注册用户评论 请点击登录 评论 (0 个评论)

数据加载中...

Archiver|手机版|科学网 ( 京ICP备07017567号-12 )

GMT+8, 2024-10-18 16:40

Powered by ScienceNet.cn

Copyright © 2007- 中国科学报社

返回顶部