Home

References

Penn State: https://newonlinecourses.science.psu.edu/stat414/

Intro

As with most pages on this site this is intended as a reference page. It's assumed you've already taken a class or two in stats

In [1]:
#The usual imports
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()  # for plot styling
%matplotlib inline

Probability

Definitions

Event: Is one of a multiple possible occurances. Ex getting tails when you flip a coin.

Sample Space: Is the set of all possible outcomes of an event. This is often called a Random Variable(RV) such as X. Using our coin example the Sample Space X := {H,T}.

Probability: A number between 0 and 1 that is the relative frequencey of a possible event to the size of it's sample space. $$P(X=x) = N(x)/N \text{ where N = size of the sample and N(x) is the number of occurances of x }$$ So the probability of a heads is 1/2=0.5. There is one head even and two possible outcomes of the coin toss

Properties

Properties Let Y be an event in the space S then

  1. P(Y) >= 0
  2. $\sum{P(S)} = 1$
  3. The union of exclusive events is the sum of the probabilities (for both finite and countably infinite)
    $P(Y_1 \cup Y_2 \cup ... \cup Y_n) = P(Y_1) + ... + P(Y_n) $

Theorems

  1. $P(Y) = 1-P(Y') \text{ where Y' is the complement event }$
  2. $P(null) = 0 $
  3. $ \text{If } X \subseteq Y \text{ then } P(X) \leqslant P(Y) $
  4. $P(Y) \subseteq 1 $
  5. $P(A \cup B ) = P(A) + P(A) - P(A \cap B) $

Ex1 - Sum of 2 dice

Imagine we're going to roll 2 dice and observe the outcome as the sum.
What is the sample space? Our space in every combination of events.
Die 1 X = {1,2,3,4,5,6}. Similarly Die 2 Y = {1,2,3,4,5,6}
So let S := {1+1, 1+2,1+3, ... , 6+5,6+6} Pretty boring by hand but I do recommend doing it. Thankfully python has a package for such stuff.

Counting

Combinations/Permutations

As our experiments get larger so will our sample space so we'll need some advanced counting techniques.

Multiplication is the simplest of techniques, but it can be difficult to use in determining the probability of an event.

Permutation: Method of selection used when the order matters. Suppose n is the number of elements and we wish to select k. Then the number of permutations is given by:
$$ _nP_k = \frac{n!}{(n-k)!} = n*(n-1)*(n-2)* ... (n-k+1) $$

Combination: Method of selection used when the order doesn't matter $$ {n \choose k} = _nC_k = \frac{n!}{k!(n-k)!} = \frac{n(n-1)...(n-k+1)}{k!} $$

Handy tip: $$ _nC_k = \frac{_nP_k}{k!} $$

Ex: Suppose we have a set(A) of three letters (a,b,c). We'll choose 3 while blind folded.
Permutations(A) = {abc,acb,bac,bca,cba,cab} (= 3!)
Combinations(A) = {abc}

In [2]:
# Let's see how to do this in python
import itertools as itt

die = [1,2,3,4,5,6]
# with_replacement ensures we treat the die as distinct
t = itt.combinations_with_replacement(die,2)     # let itertools do the boring work
print(', '.join([str(i[0]+i[1]) for i in t]))    # Notice the repetition?
print()

A = 'abcde'
t2 = itt.permutations(A,3)
print('Permutations-Every thing counts')
print(', '.join([str(i[0]+i[1]+i[2]) for i in t2])) 

t3 = itt.combinations(A,3)
print('Combinations-Uniqueness counts')
print(', '.join([str(i[0]+i[1]+i[2]) for i in t3])) 
2, 3, 4, 5, 6, 7, 4, 5, 6, 7, 8, 6, 7, 8, 9, 8, 9, 10, 10, 11, 12

Permutations-Every thing counts
abc, abd, abe, acb, acd, ace, adb, adc, ade, aeb, aec, aed, bac, bad, bae, bca, bcd, bce, bda, bdc, bde, bea, bec, bed, cab, cad, cae, cba, cbd, cbe, cda, cdb, cde, cea, ceb, ced, dab, dac, dae, dba, dbc, dbe, dca, dcb, dce, dea, deb, dec, eab, eac, ead, eba, ebc, ebd, eca, ecb, ecd, eda, edb, edc
Combinations-Uniqueness counts
abc, abd, abe, acd, ace, ade, bcd, bce, bde, cde

Conditional Probabilities

Definition

Denoted P(E|A) this describes an event(E) given a prior event(A) has already occurred.

$ P(E \text{ | }A) = \frac{P(A \cap E) }{P(A)} $
It follows that
$ P(A \cap E) = P(E \text{ | }A) P(A) = P(A \text{ | }E) P(E) \text{ aka: Multiplication Rule } $
Pretty interesting eh?

Properties

  1. $P(E \text{ | }A) >= 0$
  2. $P(E \text{ | }E) = 1$
  3. If $E_1,E_2,...E_n$ as mutually exclusive then
    $P(E_1 \cup E_2 \cup ... \cup E_n | A) = P(E_1 \text{ | }A) + ... P(E_n \text{ | }A) $

Independence

Events A,B are independent IFF $P(A \cup B) = P(A) P(B)$ If A,B are independent then

  1. P(B|A) = P(B) and P(A|B) = P(A)
  2. A and B' are also independent as are A' and B (' is the complement)

Independence can be extended to multiple cases. In such an event one needs to look at pairwise & mutual independence. A,B,C are mutually independent implies $ P(A \cap B \cap C) = P(A)P(B)PC) $ Additionally independence must hold for any combination of the two events

Ex

Recall our sum of two die example. If we looked at the roll of each die we would be considering independent events. The roll of each die is independent of the outcome on the other die.

On the other hand. Suppose we have a bucket with 5 red and 5 blue balls. We will select 2 balls. After the 1st selection the probability of getting the same color goes down. Thus the probabilities of the second event will depend on the first.

Bayes Theorem

When you know P(B|A) and you need to find P(A|B). As it turns out

$$ P(A \text{ | } B) = \frac{P(B \text{ | } A) P(A)}{P(B)} $$

Idea

The idea here is that you know some stuff about the sample space but you don't have the full picture.

Ex1:
10% of patients have liver disease, 5% of patients are alcoholics. 7% of liver diease patients are alcoholics
What's the probability that an alcoholic patient has liver disease?
We note the info P(L)=.1, P(A)=.5, and P(A | L)=.7
Now we just plug into the above formula

Ex2:
A light bulb company has 3 factories A,B,C and ship their products for quality control QC. A makes 100 bulbs per day, B makes 50 and C makes 200. QC finds that 75/day bulbs are defective, but it doesn't know where a bulb came from. We also know that the defective rate for each plant is .1,.05,and .75 respectively

So what's the probability that a randomly selected defective bulb came from A ie P(A|D)?

Discrete Distributions

Finally let's get past the counting to the good stuff.

We'll be using scipy so a word on how the objects work

  • rvs - Random Variates. This is what creates the numbers
  • pmf - Probability mass function
  • ppf - Percent point function (inverse of cdf — percentiles)
In [3]:
import scipy.stats as stats

Uniform

Describes a RV that can take 1 of k discrete values $$ f(x) = \frac{1}{k} $$

In [4]:
#ref https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.randint.html#scipy.stats.randint
low,high = 0,25                                      # define the sample space
x1 = [stats.randint.rvs(low,high) for i in range(5)]  # Each call only returns 1 number so we call a few times
print(x1)
[0, 22, 3, 19, 6]

Bernoulli

Bernoulli RVs describe an event with only two outcomes Success(p) and failure(1-p). They may seem trivial on their own. However, it's often used as a building block.

$$ \begin{equation} f(k) = \left\{ \begin{array}{cc} p & \mathrm{if\ } k=1 \\ 1-p & \mathrm{if\ } k=0 \\ \end{array} \right. \end{equation} $$

In [5]:
x2 = [stats.bernoulli.rvs(0.25) for i in range(5)]  
print(x2)
print('Success=',(sum(x2)/len(x2)))
[1, 0, 0, 0, 0]
Success= 0.2

Binomial

Think of this as a collection of the bernoulli. We repeatedly perform an experiment and put the results of each trial into 2 categories: Success & failure. Binomial describes looks at all the trials.

if n = number of trials and x is the number of successes and p = P(Success) a single trial then

$$ f(x,n) = _{n}C_{r} p^x (1-p)^{n-x} $$

In [6]:
# We're running 5 experiments each having 5 trials
# The numbers you see outputted are the number of success experiments in that trial
x3 = [stats.binom.rvs(5,0.25) for i in range(5)]  
print(x3)
#numpy.random.binomial
[2, 1, 1, 2, 0]

Neg Binomial

Similar to the Binomial this describes a series of Bernoulli trials. The caveat here is it stops at a given number of successes.
n is the number of successes, whereas p is the probability of a single success

$$ f(x,n) = {k+n-1 \choose n-1} p^x (1-p)^{n-x} $$

In [7]:
x4 = [stats.nbinom.rvs(5,0.05) for i in range(5)]  
print(x4)
# Numbers are interpreted as the number of trials performed until success was achieved
# Numbers too high? Increase the probab of success
[37, 72, 158, 164, 131]

Hypergeometric

This equation looks more difficult than it actually is.

$$ h(x;n,N,M) = \frac{ {M \choose x }{ N-M \choose n-x } }{{N \choose n}} $$

  • M = is the total number of successes
  • N = Number of objects
  • n = number of selections - sample size

Example:
Imagine there's a bucket with 20(N) balls, 7(M) are red the rest (N-M) are blue. We get to pick 12(n) balls, and we keep each one we choose. What is the probability of getting x red?

In [8]:
#hypergeom(N,M,n)
x5 = [stats.hypergeom.rvs(20, 7, 12) for i in range(5)]  
print(x5)
# These numbers represent the number of red obtained in the i'th trial
[3, 5, 5, 4, 4]

Poisson

Is often used in problems involving space,time,or areas. As well as in approximating the binomial distribution.

Let $\lambda$ represents the number of failures in a given space, then $$ P\left( x \right) = \frac{{e^{ - \lambda } \lambda ^x }}{{x!}} $$

Example: A light bulb factory makes 400 light bulbs a day. On avg 2% fail to pass quality control. What is the probability that 5 defective bulbs were made today?

In [9]:
# lambda =(0.02 * 400) = 8
# x = 5
stats.poisson.pmf(5,8)     # Exactly 5 bulbs made today are defective
#stats.poisson.cdf(5,8)    # 5 or less bulbs are defective
Out[9]:
0.09160366159257921

Multinomial

An extension of the binomial to the multivariate case

Let $X_1, X_2, ... X_n$ be RVs. Then
$ f(x_1,x_2,...,x_n;n,p_1,p_2,...,p_n) = {n \choose x_1,x_2,...,x_n} p_1^{x_1} p_2^{x_2} ... p_n^{x_n} $

In [10]:
#https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.multinomial.html#scipy.stats.multinomial
#                          The probabilities must sum to 1 
rv = stats.multinomial([8], [0.3, 0.2, 0.5])
print(rv.rvs(10))

#Each row represents 1 multinomial variable composed of 3 binomial variables each in the space of 0 to 8
# Why are there no 7s or 8s
[[0 1 7]
 [2 1 5]
 [1 1 6]
 [3 4 1]
 [3 2 3]
 [3 2 3]
 [2 2 4]
 [4 1 3]
 [3 2 3]
 [4 1 3]]

MultiHypergeometric

Interestingly this does not appear to be implemented as of now.

Continuous Distributions

Uniform

$$ \begin{equation} f(x;\alpha,\beta) = \left\{ \begin{array}{cc} \frac{1}{\beta - \alpha} & \mathrm{for\ } \alpha < x < \beta \\ 0 & \mathrm{otherwise\ } \\ \end{array} \right. \end{equation} $$

In [11]:
#Numpy => numpy.random.uniform(low,high,size)
u = stats.uniform.rvs(size=5)
print(u)
[0.41118982 0.97668865 0.72948279 0.54191801 0.24262892]

Normal or Gaussian

$$ N(\mu,\sigma) = \frac{1}{\sigma \sqrt{2 \pi}} e^{\frac{-1}{2} (\frac{x-\mu}{\sigma})^2 } $$

scipy.stats.norm()    #Is a standard normal rv    mu,var=0,1
scipy.stats.norm(1,2) #Is a nonstandard normal rv mu,var=1,2 scipy calls these location and scale
In [12]:
#numpy.random.normal
fig, ax = plt.subplots(1, 1)
x = np.linspace(-5,5,100)
rv = stats.norm(1,1)
ax.plot(x, rv.pdf(x), 'k-')
plt.show()
#Generate random numbers
#r_nums = stats.norm.rvs(size=1000)

Exponential

$$ \begin{equation} f(x;\theta) = \left\{ \begin{array}{cc} \frac{1}{\theta} e^{\frac{-x}{\theta}} & \mathrm{for\ } x > 0 \\ 0 & \mathrm{otherwise\ } \\ \end{array} \right. \end{equation} $$

In [13]:
#numpy.random.exponential
x = np.linspace(0,15,100)
rv = stats.expon(scale=3)        #Decreases rapidly so we stretch it out for illustrative purposes
plt.plot(x, rv.pdf(x), 'k-')
plt.show()

Gamma

$$ \begin{equation} f(x;\alpha,\beta) = \left\{ \begin{array}{cc} \frac{1}{\beta^{\alpha} \Gamma(\alpha)} x^{\alpha-1}e^{\frac{-1}{\beta}} & \mathrm{for\ } x > 0 \\ 0 & \mathrm{otherwise\ } \\ \end{array} \right. \end{equation} $$ where $$ \Gamma \left( a \right) = \int\limits_0^\infty {s^{a - 1} } e^{ - s} ds $$

In [14]:
#numpy.random.gamma
x = np.linspace(0,15,100)
rv1 = stats.gamma(2)        #You must provide alpha parameter
rv2 = stats.gamma(5)
rv3 = stats.gamma(10)
plt.plot(x, rv1.pdf(x), 'r-',label='alpha=2')
plt.plot(x, rv2.pdf(x), 'k-',label='alpha=5')
plt.plot(x, rv3.pdf(x), 'b-',label='alpha=10')
plt.legend(loc='right')
plt.show()

Chi-Square

If X is a standard normal RV then $X^2$ has a Chi-Square distibution with 1 degree of freedom

If $X_1, X_2,...,X_n$ are standard normal RVs then $Y = \sum_{i=1}^{n} X_i^2$ is a ChiSquare with n-1 degrees of freedom

In [15]:
#numpy.random.chisquare
x = np.linspace(0, 5, 1000)
fig,ax = plt.subplots(1,1)

linestyles = [':', '--', '-.', '-']
deg_of_freedom = [1, 2, 3, 4]
for df, ls in zip(deg_of_freedom, linestyles):
  ax.plot(x, stats.chi2.pdf(x, df), linestyle=ls, label=r'$df=%i$' % df)

plt.xlim(0, 5)
plt.ylim(0, 0.5)

plt.title(r'$\chi^2\ \mathrm{Distribution}$')

plt.legend()
plt.show()

Beta

$$ \begin{equation} f(x;\alpha,\beta) = \left\{ \begin{array}{cc} \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha) \Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1} & \mathrm{for\ } 0<x<1 \\ 0 & \mathrm{otherwise\ } \\ \end{array} \right. \end{equation} $$

In [16]:
#numpy.random.beta
fig, ax = plt.subplots(1,1)

alpha_values = [0.5, 1.5, 3.0, 0.5]
beta_values = [0.5, 1.5, 3.0, 1.5]
linestyles = ['-', '--', ':', '-.']
x = np.linspace(0, 1, 1002)[1:-1]

for a, b, ls in zip(alpha_values, beta_values, linestyles):
    dist = stats.beta(a, b)
    plt.plot(x, dist.pdf(x), ls=ls, c='black',
             label=r'$\alpha=%.1f,\ \beta=%.1f$' % (a, b))

plt.xlim(0, 1)
plt.ylim(0, 3)

plt.xlabel('$x$')
plt.ylabel(r'$p(x|\alpha,\beta)$')
plt.title('Beta Distribution')

plt.legend(loc=0)
plt.show()
In [18]:
#numpy.random.multivariate_normal
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D


N = 60
X = np.linspace(-3, 3, N)
Y = np.linspace(-3, 4, N)
X, Y = np.meshgrid(X, Y)

# Mean vector and covariance matrix
mu = np.array([0., 1.])
Sigma = np.array([[ 1. , -0.5], [-0.5,  1.5]])

# Pack X and Y into a single 3-dimensional array
pos = np.empty(X.shape + (2,))
pos[:, :, 0] = X
pos[:, :, 1] = Y


def multivariate_gaussian(pos, mu, Sigma):
    """Return the multivariate Gaussian distribution on array pos.
        pos is an array constructed by packing the meshed arrays of variables
        x_1, x_2, x_3, ..., x_k into its _last_ dimension.

    """
    n = mu.shape[0]
    Sigma_det = np.linalg.det(Sigma)
    Sigma_inv = np.linalg.inv(Sigma)
    N = np.sqrt((2*np.pi)**n * Sigma_det)
    # This einsum call calculates (x-mu)T.Sigma-1.(x-mu) in a vectorized
    # way across all the input variables.
    fac = np.einsum('...k,kl,...l->...', pos-mu, Sigma_inv, pos-mu)

    return np.exp(-fac / 2) / N

# The distribution on the variables X, Y packed into pos.
Z = multivariate_gaussian(pos, mu, Sigma)

# Create a surface plot and projected filled contour plot under it.
fig = plt.figure()
plt.rcParams["figure.figsize"] = [20, 10]

ax = fig.gca(projection='3d')
ax.plot_surface(X, Y, Z, rstride=3, cstride=3, linewidth=1, antialiased=True,
                cmap=cm.viridis)

cset = ax.contourf(X, Y, Z, zdir='z', offset=-0.15, cmap=cm.viridis)

# Adjust the limits, ticks and view angle
ax.set_zlim(-0.15,0.2)
ax.set_zticks(np.linspace(0,0.2,5))
ax.view_init(27, -21)

plt.show()

t - Student's

In [19]:
#numpy.random.standard_t
# Define the distribution parameters to be plotted
mu = 0
k_values = [1E10, 2, 1, 0.5]
linestyles = ['-', '--', ':', '-.']
x = np.linspace(-10, 10, 1000)


# plot the distributions
fig, ax = plt.subplots(figsize=(5, 3.75))

for k, ls in zip(k_values, linestyles):
    dist = stats.t(k, 0)

    if k >= 1E10:
        label = r'$\mathrm{t}(k=\infty)$'
    else:
        label = r'$\mathrm{t}(k=%.1f)$' % k

    plt.plot(x, dist.pdf(x), ls=ls, c='black', label=label)

plt.xlim(-5, 5)
plt.ylim(0.0, 0.45)

plt.xlabel('$x$')
plt.ylabel(r'$p(x|k)$')
plt.title("Student's $t$ Distribution")

plt.legend()
plt.show()

F - Distribution

In [20]:
#numpy.random.f
mu = 0
d1_values = [1, 5, 2, 10]
d2_values = [1, 2, 5, 50]
linestyles = ['-', '--', ':', '-.']
x = np.linspace(0, 5, 1001)[1:]

fig, ax = plt.subplots(figsize=(5, 3.75))

for (d1, d2, ls) in zip(d1_values, d2_values, linestyles):
    dist = stats.f(d1, d2, mu)
    plt.plot(x, dist.pdf(x), ls=ls, c='black',
             label=r'$d_1=%i,\ d_2=%i$' % (d1, d2))

plt.xlim(0, 4)
plt.ylim(0.0, 1.0)

plt.xlabel('$x$')
plt.ylabel(r'$p(x|d_1, d_2)$')
plt.title("Fisher's Distribution")

plt.legend()
plt.show()