# Probability and Stochastic Process Tutorial (2) 精选

(For new reader and those who request 好友请求, please read my 公告栏 first)

Instead of the cumbersome histogram in the Probability and Stochastic Process Tutorial (1), http://blog.sciencenet.cn/home.php?mod=space&uid=1565&do=blog&id=664051 , henceforth labeled as PSPT1, we replace the description by an analytical function, called probability density function. This from our viewpoint is simply a continuous variable approximation of the histogram. The best know example is the Gaussian density function given by

p(x)~exp [-(x-m)^2/s2]                                                                            (1)

Eq.(1) represents the familiar bell shaped curve. It has many nice properties which we shall discuss later. However, first we need to dispose of one issue. Once we take on continuous real variables, mathematically you have an immediate problem. Is Eq.(1) a well-defined object when treated as a probability density function? A real variable can assume infinite number of values. And there are two kinds of infinities. First, here is countable infinity which is exemplified by the integers series (1, 2, 3, . . . ) and non-countable infinity which is a real variable. This is best explained by a modern version of the millennium old story of the Zeno’s paradox: “You are 10 feet away from a beautiful nude woman whom you wish to embrace. But to do this you must first get to 5 feet from the woman. However to get to within 5 feet, you must get to half way of this 5 feet which is 2.5 feet (0r 7.5 feet from the desired object) ,  . . . following this argument on infinitum you just proved that you will not move at all since distance is a real number which can be divided infinitely” (note 1). In other words, any subinterval of a real variable seems to contain just as many points as the original interval. This creates a real problem for mathematicians. In fact, reportedly this paradox caused the nervous breakdown of the famed mathematician G. Cantor who founded Set Theory in 19th century. Consequently, one has a real problem when you want to talk about probability density function using real variables. How do you assign probability to uncountable number of points. Measure theory was devised to get around this difficulty. Roughly speaking, we throw away a subset of points (of measure zero) of a real variable interval leaving it with a countable infinity of subsets of points on which we can define probabilities. The beauty of this set up is not only mathematically rigorous but also consistent with what we applied scientists/engineers learn and practice as probability in universities course without the benefit of measure theory. More importantly this enables our familiar tools such as calculus to operate on probability objects such as the density function because we now know it is a well-defined mathematical object. In short, everyone is happy.

Now when we talk about stochastic processes, we introduce another special real variable TIME. Intuitively, a random variable is said to evolve in time. Thus, we’ll have non-countable infinity of random variables in a continuous time stochastic process, one for each instants of time (and there are uncountable numbers of these points). Measure-theoretic stochastic process development has to take care of that too. The gold standard of all these foundational work is the famous 1953 textbook by J. Doob. But this double-barreled dose of measure-theoretical terminologies in rigorous stochastic process textbook makes learning by uninitiated engineering students hard and confusing.

But for us, practitioners, we need not worry further about measure theory. All we need to know is that non-measure theoretic probability rest on a rigorous and consistent foundation. Knowledge of the rationale of measure theory as pointed above and some of its terminologies are often enough for us. All the tools such as calculus and algebra can be used without fear. A textbook on probability and stochastic processes can be written with no reference to measure theory. The famed 1957 textbook by Davenport and Root on which I learned the subject was written this way.  And we will develop the rest of the tutorial with the above in mind. (I apologize to pure mathematicians who will find my non rigorous description of measure theory above barbarous and unacceptable).

Back to Gaussian random variables.

1.      Note it takes only two parameters, the mean m and the variance s, to completely define the Gaussian density function (for vector Gaussian r.v.s we have a mean vector and a covariance matrix both involving finite number of parameters. This simplifies calculation tremendously. But strictly speaking, a Gaussian random variable characterized by  Eq.(1) is capable of taking on any value in the continuum (-, +).

2.      There are empirical as well as theoretical reasons as to why Gaussian random variable occurs frequently in nature. Central Limit Theorem says that any random phenomenon that is a result of many complex interacting elementary random variables tends to a Gaussian density.

3.      If you only have information on the mean and variance, then assuming the r.v. is Gaussianly distributed adds minimal amount of unwarranted assumptions.

4.      One of the most useful, but extremely difficult and generally unsolved problem is this. Given y=f(x). If we know the probability density of x then what is the density function of y as a r.v. when the function f is explicitly known but not invertible. In other words, knowing the input and the system what is the output?  System theorists know this is the \$64,000 question of the profession.  But this is another one of those dirty secret of applied mathematics seldom emphasized in textbooks. However, if the function f is linear and x is Gaussian, then y will also be Gaussian whose mean and variance can be easily calculated. The entire success of Kalman Filter is built on this fact.

For these reasons, in computation involving real variables we often assume all random variables are Gaussian unless we have specific information otherwise. Indeed the famed Kalman Filter has been applied to numerous situations successfully even when we know the noises involved are non-Gaussian. Similarly, in general application, I’d not hesitate to approximate, for example, the histogram in Fig.1 of PSPT1 by a Gaussian density function.

There are other probability density function that have several nice properties and is useful in application to discrete event systems with discrete variables, i.e., the Poisson probability distribution function and the exponential density function. However, I’ll postpone there introduction to elsewhere or a later article.

With this as background, we can now proceed to discuss stochastic processes in the next article.

(Note 1.)  A practical engineer’s answer to this paradox is “never mind the paradox, all I care is getting to close enough”.

(Note 2. 2/27/2103)) One reader who wishes to remain anonymus pointed out a couple of  typos and clarifications needed in this article which have been incorporated.

http://blog.sciencenet.cn/blog-1565-665359.html

## 全部热门博文导读

GMT+8, 2018-1-24 19:12