Syllabus http://lucylabs.gatech.edu/ml4t/
Youtube Videos https://www.youtube.com/watch?v=s5xKxliBMTo&list=PLAwxTw4SYaPnIRwl6rad_mYwEk4Gmj7Mx
The course is divided into 3 main areas:
Three textbooks are used
Without further adieu - Let's get started
Manual approach ( As of May2021 )
I've decided to download URTY, UDOW, and SPY for May-01-2020 to May-21-2021
Now we just load into pandas. Assuming you
import pandas as pd
# It's easier to format the dates when reading
df = pd.read_csv("CS7646_resources/URTY.csv", parse_dates=['Date'])
# show first 5 rows
#df.head()
# show last n rows
df.tail(4)
We leave it to the reader to try out the following commands
df.columns # get your columns
df.dtypes # get each columns datatype
df.iloc[5] # get row 6 index = 5
print (df[10:21]) # print rows between index 10 and 20 inclusive
df['Price'].max() # compute and return max
df['Volume'].mean() # Compute Mean Volume
# Plotting for the very first time can be tricky
# so we provide an example
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_csv("CS7646_resources/URTY.csv")
# tailor to suit your ideal chart size
# [width, height]
plt.rcParams['figure.figsize'] = [20, 5]
# Just one plot
#df['Price'].plot()
# plt.show() # must be called to show plots
# If running from a python script
# remove %matplotlib inline
# add plt.show() at bottom
# want to see multiple columns plotted?
df[['High','Low']].plot()
So far we've looked at only one stock. What if we want multiple stocks in one dataframe? Our data has the exact same dates, but this is NOT an accurate reflection of the real world.
Dataframes can be created in a multitude of ways
# from a dictionary
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
# from an array
df2 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=['a', 'b', 'c'])
# from a list of lists
data = [['tom', 10], ['nick', 15], ['juli', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# from a pair of dates
start_date,end_date = '2021-01-01', '2021-01-26'
dates=pd.date_range(start_date,end_date)
df1=pd.DataFrame(index=dates)
Of course 1 dataframe is rarely enough and we will need to join them
df1=df1.join(df2) # Very basic join uses index as the key, and uses the default left join
df1=df1.join(df2,how='inner') # Very basic join uses index as the key, with a specified inner join
# WARNING - columns names must be unique, otherwise you'll get an overlap error
# when performing a left join you may need to drop the na's
# for example when there's no data
df1=df.dropna(subset=["Price"])
Now we get fancy
# Create 6 days
dates = pd.date_range("20130101", periods=6)
# dataframe indexed by dates, populated with random numbers
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
# Randomness with different datatypes
df2 = pd.DataFrame(
{ "A": 1.0,
"B": pd.Timestamp("20130102"),
"C": pd.Series(1, index=list(range(4)), dtype="float32"),
"D": np.array([3] * 4, dtype="int32"),
"E": pd.Categorical(["test", "train", "test", "train"]),
"F": "foo",
}
)
You can go so far as to convert to a numpy array as well.
df.to_numpy()
It should be noted however that this can be very expensive computationally. Reason being that a data frame data types are defined by each column. Whereas in numpy an array must be homogenous
import pandas as pd
start_date,end_date = '2021-01-01', '2021-01-26'
dates=pd.date_range(start_date,end_date)
df1=pd.DataFrame(index=dates) # define empty dataframe with these dates as index
dfURTY = pd.read_csv("CS7646_resources/URTY.csv",
index_col="Date", parse_dates=True,
usecols=['Date','Price'], na_values=['nan'])
dfSPY = pd.read_csv("CS7646_resources/SPY.csv",
index_col="Date", parse_dates=True,
usecols=['Date','Price'], na_values=['nan'])
dfUDOW = pd.read_csv("CS7646_resources/UDOW.csv",
index_col="Date", parse_dates=True,
usecols=['Date','Price'], na_values=['nan'])
df1=df1.join(dfURTY)
df1.rename(columns={'Price': 'urty'}, inplace=True)
# default join is 'left', but you can specify
# df1 = df1.join(dfSPY, how='inner')
df1=df1.join(dfURTY)
# Rename column
df1.rename(columns={'Price': 'spy'}, inplace=True)
df1=df1.join(dfURTY)
df1.rename(columns={'Price': 'udow'}, inplace=True)
df1.sort_index(axis=0,ascending=False,inplace=True)
# or you can use
# df1.sort_values(by='COLUMN_NAME')
# but this doesn't apply to the index column which has no name
df1.head()
More Selecting/Slicing
# You must sort first if you want to select based on date values
df_urty.sort_index(inplace=True)
# simple index slicing
df_urty[:3, 1:3]
# Using the index value
print(df_urty.loc["20210501":"20210510"])
# Select a particular date
# Make sure the date you're looking for exists!!
dates = pd.date_range("20210401", periods=1)
df_urty.loc[dates[0]]
# Multiple Columns
df_urty.loc[dates[0],['Open','High','Low']]
# or use iloc with the indices
# Don't forget it includes the start index but not the ending
df_urty.iloc[1:3,2:4]
df_urty.iloc[1:3,[1,3]]
# Access a single value for a row/column pair by integer position.
df_urty.iat[3,3]
# Access a single value for a row/column label pair
df_urty.at["20200505",'Price']
Using Boolean Conditions
df_urty[df_urty['Price'] > 23.5]
# is in collections
df2["E"] = ["one", "one", "two", "three", "four", "three"]
df2[df2["E"].isin(["two", "four"])]
Mostly just a review of how to use numpy.
Here's a link Numpy Quick start
# Max & Min are similar to numpy
df_urty['Price'].max()
df_urty['Price'].min()
# computes for all numerical columns
df_urty.mean()
df_urty.std()
"""Computing Rolling Statistics"""
# Compute rolling mean using a 20-day window
rm_SPY = df['SPY'].rolling( window=20).mean()
"""Computing Daily Returns"""
# The numpy way
# daily_returns[1:] = (df[1:] / df[:-1].values) - 1 # compute daily returns for row 1 onwards
daily_returns = (df / df.shift(1)) - 1 # much easier with Pandas!
daily_returns.iloc[0, :] = 0 # Pandas leaves the 0th row full of Nans
"""Computing Cumulative Returns"""
# Left to the reader :)
# Hint -> CumReturn[t] = (Price[t]/Price[0]) - 1
"""Bollinger Bands = (rollingMean+2StdDev,rollingMean-2StdDev) """
def get_bollinger_bands(rm, rstd):
"""Return upper and lower Bollinger Bands.
Input : rm = pandas series containing a rolling mean
rstd = pandas series containing a rolling standard deviation
returns 2 pandas series
upper_band = rolling mean + 2*rolling standard deviation
lower_band = rolling mean - 2*rolling standard deviation
"""
# Quiz: Compute upper_band and lower_band
upper_band = rm + rstd * 2
lower_band = rm - rstd * 2
return upper_band, lower_band
Rarely will we get complete data that is ready to use as is. Sometimes stocks don't trade and missing data will appear.
# to Identify missing values
missing = pd.isna(df["Price"])
# OR
df2.isna()
# HANDLING
# One common technique is to fill empty entries
df2.fillna(0) # Replace NA with a scalar value
df2["one"].fillna("missing")
# You can fill forwards and backwards
df.fillna(method="pad") # pad / ffill -- Fill Forward
df.fillna(method="bfill") # bfill/ backfill -- Fill Backwards
# There is also shortcut functions
df.ffill()
df.bfill()
# Note that if you need to do both ALWAYS perform forwards first
# The last method we demostrate is interlpolate
df.interpolate()
# There are several methods available for Interpolate
df.interpolate(method="time") # Index aware interpolation (Index is dates)
df.interpolate(method="values") # Index aware interpolation (Index is floating point)
import scipy as sp
df.interpolate(method="quadratic") # For a growing time series
df.interpolate(method="pchip") # approximating values that are part of a cumulative distribution function
df.interpolate(method="akima") # For smooth plotting
df.interpolate(method="spline",order=2)
df.interpolate(method="polynomial",order=2)
# as usual - clean your data
# https://stackoverflow.com/questions/42135409/removing-a-character-from-entire-data-frame
df_urty.rename(columns={'Change %': 'Return'}, inplace=True)
df_urty['Return'] = df_urty['Return'].replace({'%':''}, regex=True).astype('float64')
returns = df_urty['Return']
# plotting a Histogram
# Plot a histogram
%matplotlib inline
returns.hist(bins=15,figsize=(20,5)) # changing no. of bins to 20
# If you get messy results check your input
In this next section we take a look at the correlation between two stocks, urty & spy. We define beta as the slope of line measuring the relationship between a stock and the index spy. We define alpha as the interesction of the line with the y-axis
WARNING Beta is the slope of dependence. It is NOT a measure of correlation
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_urty = pd.read_csv("CS7646_resources/URTY.csv",
index_col="Date", parse_dates=True,
na_values=['nan'])
df_urty.rename(columns={'Change %': 'urtyrtn'}, inplace=True)
df_urty['urtyrtn'] = df_urty['urtyrtn'].replace({'%':''}, regex=True).astype('float64')
df_spy = pd.read_csv("CS7646_resources/SPY.csv",
index_col="Date", parse_dates=True,
na_values=['nan'])
df_spy.rename(columns={'Change %': 'spyrtn'}, inplace=True)
df_spy['spyrtn'] = df_spy['spyrtn'].replace({'%':''}, regex=True).astype('float64')
df = df_urty[['urtyrtn']].join(df_spy[['spyrtn']], how='inner')
#df.head()
df.plot(kind='scatter', x='spyrtn', y='urtyrtn', figsize=(20,8))
beta_urty, alpha_urty = np.polyfit(df['spyrtn'], df['urtyrtn'], deg=1)
# we choose deg=1 to imply a line of the form
# y = bx + a
print ("beta_urty= ", beta_urty)
print ("alpha_urty=", alpha_urty)
plt.plot(df['spyrtn'], beta_urty*df['spyrtn'] + alpha_urty, '-',color='r')
df_corr = df[['spyrtn','urtyrtn']]
print('\n Here we can see the correlation')
print(df_corr.corr(method='pearson'))
# https://stackoverflow.com/questions/42135409/removing-a-character-from-entire-data-frame
# df[cols_to_check] = df[cols_to_check].replace({';':''}, regex=True)
#df_urty.tail()
Pretty trivial formula:
portfolio_start_val = 1000000
start_date = 2009-1-1
end_date = 2011-12-31
symbols = ['SPY', 'XOM', 'GOOG', 'GLD']
allocations = [0.4, 0.4, 0.1, 0.1]
normed = prices/prices[0]
allocated = normed * allocations
pos_vals = allocated * portfolio_start_val
port_val = pos_vals.sum(axis=1)
daily_rets = daily_rets[1:]
cum_ret = (port_val[-1]/port_val[0] - 1)
avg_daily_ret = daily_rets.mean()
std_daily_ret = daily_rets.std()
k = 252 # of samples per year (NOT the number of samples)
# ie if using daily data then k = 252 = number of business days in 1 year
# ie if using weekly data then k = 52 = number of weeks in 1 year
daily_rf = 0.0
SharpeRatio = sqrt(k) * mean(daily_rets - daily_rf) / std(daily_rets)
NB: technically we should subtract std(daily_rf) from the denominator. But this is often a constant and is therefore dropped.
risk free rate options/estimators:
Note Bene This section requires the use of scipy.optimize
An optimizer is simply an algorithm to
Using an optimizer boils down to three parts:
Example:
Minimize a scalar function of one or more variables using Sequential Least Squares Programming (SLSQP).
import scipy.optimize as spo
def f(X):
return (X - 1.5)**2 + 0.5
Xguess = 0 # random guess to use as a starting pt
min_result = spo.minimize(f, Xguess, method='SLSQP', options={'disp': True})
If you run the above you will get the below results
Notice the value of x at the final line. This is the minimum
Optimization terminated successfully. (Exit mode 0)
Current function value: 0.5
Iterations: 2
Function evaluations: 7
Gradient evaluations: 2
fun: 0.5
jac: array([1.49011612e-08])
message: 'Optimization terminated successfully.'
nfev: 7
nit: 2
njev: 2
status: 0
success: True
x: array([1.5])
Optimizers will have difficulties when a function is not convex. Wikipedia-Convex function
Recall that a function f(x) is convex iff the line segment for any two points on a graph lie above the graph.
In other words: if we draw a line between any two points on the graph, then all points on the graph must be below the line. If at any point inbetween the line endpoints is above the line then there will be more than 1 minimum.
Convex
import scipy.optimize as spo
def f(X):
return (X - 1.5)**2 + 0.5
Xguess = 0
# Minimize a scalar function of one or more variables using Sequential Least Squares Programming (SLSQP).
min_result = spo.minimize(f, Xguess, method='SLSQP', options={'disp': True})
print(min_result)
Building a Parametrized Model
This next example demonstrates fitting a line using an optimizer
import numpy as np
import scipy.optimize as spo
# random line : slope and intercept
l_orig = np.float32([4, 2])
# data construction based on our line
Xorig = np.linspace(0, 10, 100)
Yorig = l_orig[0] * Xorig + l_orig[1]
# Generate noisy data points => Data + noise
noise_sigma = 3.0
noise = np.random.normal(0, noise_sigma, Yorig.shape)
data = np.asarray([Xorig, Yorig + noise]).T
# Uncomment to see plot
# plt.plot(data[:,0], data[:, 1], 'go', label="Data points")
# Try to fit a line to this data
# we will use a sum of square error minimization approach
def error_func(line, data):
# actualvalue - estimated value
return np.sum((data[:,1] - (line[0] * data[:, 0] + line[1])) ** 2)
# Generate initial guess for line model
l = np.float32([0, np.mean(data[:, 1])]) # slope = 0, intercept = mean(y values)
# Call optimizer to minimize error function
result = spo.minimize(error_func, l, args=(data,), method = 'SLSQP', options={'disp': True})
print(result)
"""Minimize an objective function using SciPy: 3D"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import scipy.optimize as spo
def error_poly(C, data): # error function
"""
Compute error between given polynomial and observed data.
Inputs
C : numpy.poly1d object or equivalent array representing polynomial coefficients
data: 2D array where each row is a point (x, y)
Outputs
err (int) = error as a single real value.
"""
# Metric: Sum of squared Y-axis differences
err = np.sum((data[:,1] - np.polyval(C, data[:,0])) ** 2)
return err
def fit_poly(data, error_func, degree=3):
"""Fit a polynomial to given data, using a supplied error function.
Parameters
----------
data: 2D array where each row is a point (X0, Y)
error_func: function that computes the error between a polynomial and observed data
Returns polynomial that minimizes the error function.
"""
# Generate initial guess for line model (all coeffs = 1)
Cguess = np.poly1d(np.ones(degree + 1, dtype=np.float32))
# Plot initial guess (optional)
x = np.linspace(-5, 5, 21)
plt.plot(x, np.polyval(Cguess, x), 'm--', linewidth=2.0, label = "Initial guess")
# Call optimizer to minimize error function
result = spo.minimize(error_func, Cguess, args=(data,), method = 'SLSQP', options={'disp': True})
return np.poly1d(result.x) # convert optimal result into a poly1d object and return
def test_run():
# Define original line a polynomial of degree 2
l_orig = np.float32([1.5,-10, -5, 60, 50])
t = (-10*(l_orig[0]**4)) + (-10*(l_orig[1]**3)) + (-10*l_orig[2]**2)+ (-10*l_orig[3]) + l_orig[4]
print(t)
# data construction based on our line
Xorig = np.linspace(-10, 10, 100)
# bad bad bad
# the exponents belong to the xorig values. l_orig are the co-efficients
# Yorig = (Xorig*(l_orig[0]**4)) + (Xorig*(l_orig[1]**3)) + (Xorig*l_orig[2]**2)+ (Xorig*l_orig[3]) + l_orig[4]
Yorig = np.polyval(l_orig, Xorig)
print(Xorig[0],Yorig[0])
# Generate noisy data points => Data + noise
noise_sigma = 1.0
noise = np.random.normal(0, noise_sigma, Yorig.shape)
data = np.asarray([Xorig, Yorig + noise]).T
# Try to fit a line to this data
# Generate initial guess for line model (all coeffs = 1)
degree = 4
Cguess = np.poly1d(np.ones(degree + 1, dtype=np.float32))
# Plot initial guess (optional)
x = np.linspace(-10, 10, 50)
plt.rcParams['figure.figsize'] = [20, 5]
plt.plot(x, np.polyval(Cguess, x), 'm--', linewidth=1.0, label = "Initial guess")
# Call optimizer to minimize error function
result = spo.minimize(error_poly, Cguess, args=(data,), method = 'SLSQP', options={'disp': True})
final = np.poly1d(result.x)
print(final)
test_run()
The ML problem: given x(observations) how can we determine y
Ex: Given price momentum, Bollinger Values how can we determine the future price?
Supervised regression learning:
Consider the problem of robot navigation. X would be the inputs coming from the sensors and Y would be the resulting direction change. If learning is introduced then it will use it's memory whenever it encounters previous situations.
Similarly for a stock we may have the history of a stock features, as well as the output price. The first step in building our model is constructing our data. Determining what features are important, determine the time period as well as possible determining any related stocks.
Once you've built a model you need to test it, this is called backtesting. Backtesting is where you run your model on past data to determine it's accuracy. While simple to explain this is not as simple as it appears. A model run on data used to build it is fraught with challenges. Regression models can be noisy and uncertain. It's also difficult to measure and estimate uncertainty.
This section is about Supervised Regression learning. where Regression is defined as a numerical model. Suppose your data consists of barometric and rainfall measurements. We want to be able to predict rainfall as a model of barometric measurement. The classical approach to this might be to fit a line $rain = barometer*x+b$ but there are other approaches. Another approach might be to take an input baramoter reading, then locate, or query the data, for the k-nearest neighbours and take their mean to estimate the output (rainfall).
Our examples so far have been rather simplistic and limited. As a problem grows in size, scope and ambiguity, the approach to modelling is different. Parametric models are used for problems limited in scope. For example a cannon ball distance based on angle is straight forward. But consider honey bee production relative to the richness of their food. Linear Regression would fall into the parametric approach because the model is known, it simply needs to be fitted. But in the honey bee case we would a nonparametric model such as K-means regression.
To build our model we begin by splitting our data into a train and test sets. Since we are dealing with time sensitive data our train data should always be the oldest, making the test data the most recent.
We will be implemented several learners to assess their quality on the problem of stock modelling.
Our models will follow a similar pattern:
Example
class LinRegLearner:
def __init__():
pass
# y = Mx+b
def train(xtrain,ytrain):
self.M, self.b = your_favourite_linreg()
def query(X)
y = self.M * X + self.b
return y
class KNNLearner:
# simply modify the above accordingly
Recall our Knn Solution. We define our k for some constant, take some training data and create an algo from it. Suppose we take k=3 and we test our query. At different points the values used in the k algo will of course change. If we take k to an extreme, say equal to the number of points then we just end up with an average value over all points for any point we input to the model. Similarly if we take k as just 1 then for each input to the model only 1 value is returned, and that value is the corresponding value from the training data. In other words we have perfectly fit the model ... meaning we have overfit the model.
A similar phenomena occurs when fitting a parametric model of an arbitrary degree
Of course visuals are nice and all but the truth is in the details. We want to quantify the above. RMS to the rescue. RMS is the Root Mean Square error. And is defined as follows.
$$RMS = \sqrt{ \frac{\sum(y_{test}-y_{predict})^2}{N} }$$where y_predict is the result produced by our model, and y_test is the expected result as given by our test data.
Warning In most ML model building lifecycles we would perform a cross validation where we take random samples for our training and testing data. When dealing with a time series data it doesn't make sense. If you do include recent data into the training phase then you've effectively used the future data to build the model. This simply doesn't work.
What you can do is perform Roll Forward cross validation. This is where you're training data is always some time frame before your test data. After the first test you increment your time frame and test against the next set of data. This allows you to perform repeated train-test steps without peaking into the future.
Evaluate the accuracy by measuring the difference (RMS error) between the predicted results and the actual values. This is measured quantitatively by the correlation. NB correlation measures how well the line (aka model) fits the results. It is not the slope of the line. In general a large RMS error implies a low correlation.
What is overfitting? Suppose we wanted to graph the degrees of freedom d against our RMS error e. To help visualize this think of a polynomial with degree d. As the degree of freedom increases from 0 the error will decrease. BUT there will be a point where the error (against the test data) will resume increase, while the insample error (training data) continues to decrease
What if we repeated the same experiment using a Knn model? Well for k=1 it is clear that the error will be small but as k increases so does the error. If you recall for k=n, n=sample size, the model will return the average over all values.
Can weak learners be combined to create a single strong learner? It turns out YES! To learn how this done read on!
In the previous sections we looked at learners that produce a base model. In an ensemble model we combine these learners into a single model, how we combine can be multifaceted. In general ensemble learners have a lower error and are less likely to be overfitted. Ideally ensembles should combine models of different types, for example a knn and a polynomial.
Method 1 (Bootstrapping or bagging): In this method we train models based on different subsets of data. So we might choose n samples from our training subset and create a model from it. Then we could query each model and take the mean of each, to produce our result. Thus our result is driven by a ensemble of each model.
Boosting is very similar to bagging. The big difference here is that a weighting factor is introduced. Models with a higher error get a lower weight and will affect the results less. The most popular implementation of this is the adaBooster. It should be noted that boosting is prone to overfitting.
How do funds work? Well first let's get some basic assumptions down first.
Every hedge fund, and fund manager, effectively manage a portfolio of stocks. This portfolio is a basket of stocks. The main types of funds are:
A key topic for all fund managers is the compensation which is tied to the fund AUM (Assets under Management)
Example : Two and Twenty
You manage a fund which started the year at 100M and grew to 115M by year end.
So you'll get 2% of 100M (2M) and 20% of the 15M profit (3M). You just made 5M for the year
Expense ratio and the 2 and 20 approach both motivate the manager differently
Hedge Funds attract large institutions, funds of funds, and very wealthy individuals. In order to sell themselves there are a few approaches. A track record spanning 5 or more years, a good simulation and story, portfolio fit (do you fill an area they haven't already covered)
Sharpe Ratio Also known as the risk adjusted return (see section 1.07)
Fundamentally a hedge fund will calculate a "Target Portfolio" that it wants to achieve. Of course it would also have a "Live Portfolio" which it is trying to push as close as possible to the target. To do this it has an algorithm that is analyzing market data to determine the next best step.
All of these elements are highly dependent on the approach taken. An algo can analyze realtime tick (order data), or live price data, or both of course.
There are many online brokers that will execute your order. This section deals with how your order is fulfilled.
First What is an order?
Second The order book
This is where the exchange places your order. Highly anonymous and publicly available. Often reveals interest in an symbol. Each order will be segmented into two main types: Asks and Bids. Ask is for selling, bids are to buy. (recall that exchanges are essentially auctions). Price movement can be approximated using these orders. High number of asks and low number of bids will pressure the price towards the bid price.
Caveat: with so few exchanges and so many investors orders may often be executed at a Dark Pool. A quasi exchange often managed by a few brokers which can relieve pressure on an exchange.
How Hedge Funds exploit market Mechanics Imagine you are in seattle and make an order that goes to a pool in atlanta. The hedge fund sits in NY and has very low latency, especially considering your distance. HF monitors the order book and observes that the price is going up, so it buys. You hit buy at the same time as well. BUT you are far away! So the HF order is received and filled by the exchange well ahead of you. This could be just a second ahead of you but a second could mean thousands of orders at an exchange. So now the hedge fund in newer york turns around and put a sell order in at a price higher than it bought. Your order will now be filled at the new sell price. All of this takes place in a few milliseconds which makes a world of difference.
A similar situation will also occur whenever your dealing with a large distance. London vs NYSE can contain a discrepancy for a few milliseconds. The HF orders in fact help to equalize these prices very rapidly
Exchanges Handle: Buy, Sell, Market Limit
Brokers Handle Stop loss, Stop gain, Trailing stop, sell short.
Short Selling Used when the price is expected to go down.
You think xyz is going to decrease in price. So you borrow xyz from someone and you turnaround and sell it.
Example: 100 xyz is borrowed at a price of \$100.00 pershare, for a total value of \\$10,000.00, and you promptly sell it. After 1 month the price is \$90/share so you buy 100share (total \\$9000) and you prompty pay off your lender. You've just made 1000 dollars in 1 month.
Of course life isn't always so simple. The risk in short selling is when your wrong, ie when the price goes up contrary to your expectations. There is no limit to how high a stock price can go, what this means is there is no limit to the potential loss or risk in short selling. When you buy your risk is limited to the purchase price, but the reward is unlimited.
Suppose you have a company that makes \$1.00 per year. What is this company worth? There's many ways of looking at value. The most likely possible choice would be say 10-50 dollars depending on Interest rates. The reason for this is based on some basic assumptions. Suppose you have 70 years left in your life then you could expect to make \\$70.00. But since the value of a dollar today is higher than a dollar in 1 year we need to discount these future dollars.
A company has three types of values. A true value, an Intrinsic value, and a Market Value. Market Value is the easiest and determined by the market price of it's stock multiplied by the shares outstanding. The Intrinsic value is the present value of the future returns. Book Value reflects the value of the companies assets, in other words it's balance sheet of it's assets. Book value includes things like inventory, but not something like future returns.
Present Value of Money
A dollar now is worth more than a dollar tomorrow. How do we express this mathematically?
$$ PV = \frac{FV}{(1+i)^p}$$
Where FV is the future value, i is the interest/risk rate and p is the number of coumpounding periods.
The interest rate is called by many names depending on the context. The risk rate, interest rate, return rate are just a few. It also tells you a lot about the asset. For example government bonds will have relatively low return rates when compared to say a corporate bond dividend rate.
Intrinsic Value The present value of all future dividends
Here we alter the above equation and use the discount rate (dr).
$$ IntrinsicValue = PV = \frac{FV}{dr}$$
For example our company that pays \$1.00 per year would be worth \\$20.00 at a discount rate of 5%.
Book Value Total assets minus intangible assets and liabilities.
Suppose a company has 4 factories worth 10M apiece. It also has 3 patents worth 15M. and a 10M loan
Then the Company book value is 40M - 10M = 30M. Patents are ignored since they're considered intangible.
Market Value Shares outstanding x Market Price
The power of information: Ever notice how news affects stock price? The reason for this is simple: it affects the value that is computed by the above equations. Suppose the factories in the last example are in an area where war broke out. Would their value remain 10M apiece. Probably not.
Definition
Consider 2 assets A & B, with weights 75% and -25%, and the returns for 1 day are +1% and -2% respectively.
What's the return for the portfolio?
75% of 1% is .75%; -25% implies that the stock was shorted, so (-25%)*-2% yields .50%
Our total 1 day return is thus 1.25%
The Market Portfolio
In each country there is a index that is widely considered as representative of their market or economy. Some examples are
These indices are composed of many stocks, and are generally Cap weighted. Meaning their weight is capped $w_i = \frac{MarketCap_i}{\sum MCaps} $. And these indices can often be broken down even further into various sectors.
The CAPM Equation $ r_i(t) = \beta_i r_m(t) + \alpha_i(t) $ where $r_m$ is the market return
To fully understand notice what this says notice that
Active v Passive Management
w.r.t CAPm both types of management will treat Beta and $r_m$ similarly but $\alpha$ is a different story. According to CAPm alpha should be random with an expected value of 0. Passive managers agree with this. However active managers do not, they take the position that alpha can be predicted in some form. For example alpha is positive for a stock that will go up, and vice versa. They may not always be right but on average they believe they are.
CAPM for Portfolios Can be easily derived from the original
$r_p(t)=\sum w_i (\beta_i r_m(t) + \alpha_i(t))$
$r_p(t)=\beta_p r_m(t) + \alpha_p(t))$ ( where $\beta_p = \sum_i w_i \beta_i$ )
or
$r_p(t)=\beta_p r_m(t) + \sum w_i \alpha_i(t))$ under active management
Implications of CAPM : In upwards markets you want a larger beta, but in downwards markets you want a smaller beta.
APT : Pricing Theory You can get a more accurate beta by breaking it out into it's individual sector component.
Typical Hedge funds look for stocks that perform well relative to the market. ie stocks that rise faster than the market and fall slower than the market.
Consider:
StockA prediction is 1% over market (w beta of 1.0) So they take a long position of \$50.00
StockA prediction is -1% below market (w beta of 2.0) So they take a short position of -\\$50.00
Scenario 1 : Time frame of 10days, Market is 0% after 10 days, our prediction is true
Then we get a return from A of 1% of 50 (0.50) + return from B of -1% of -50 (0.50). Our final result is \$1.00 = 1%
Scenario 2 : Time frame of 10 days, Market is +10\% after 10days, then
$r_A = (1.0)*10% + 1% = 11% => (50*11%)=5.50$
$r_B =-1*( (2.0)*10% + -1%) = -19% => (50*-19%)=-9.50$ (Note that we multiply by -1 to represent our short position!)
Total return : -\$4.00 which is Return rate : -4%
Scenario 3 : Time frame of 10 days, Market is -\10% after 10days, then
$r_A = (1.0)*-10% + 1% = -9% => (50*-9%)=-4.50$
$r_B =-1*( (2.0)*-10% + -1%) = 21% => (50*21%)=10.50$
Total return \$6.00 with a return rate of 12\%
What have we learned? Well if you don't position your position properly you can still lose.
In reality though what often happens is that funds use the expected value of the CAPM over the portfolio under multiple scenarios
\begin{equation} \begin{split} r_p & = \sum_i w_i (\beta_i r_m + \alpha_i) \\ & = (w_A \beta_A + w_B \beta_B ) r_m + (w_A \alpha_A + w_B \alpha_B ) \\ & = (0.5*1 + -0.5*2.0) r_m + (0.5*1 + -0.5*-1.0) \\ & = -0.5*r_m + 1.0 \end{split} \end{equation}Now we have an equation that can equalize the expected value. A followup quastion might be how do we eliminate the market risk? ie we want $(w_A \beta_A + w_B \beta_B ) = 0$
This is just a linear optimization problem.
We want weights $w_A$ and $w_B$ such that $\beta_p = 1*w_A + 2.0*w_B = 0$ we can express this as $w_A = -2*w_B$ since B is shorted. You may also recall that $w_A + w_B =0$ by definition of the weights of a portfolio.
So
$w_A = -2*w_B$ and $abs(w_A)+abs(w_B)=1$
Now sub 1 into 2 to get $abs(-2*w_B)+abs(w_B)=1$
Which we can solve to get $abs(w_B)=\frac{1}{3}$
and simplify to get the final solution
$w_B=\frac{-1}{3}$ and $w_A=-2*w_B=\frac{2}{3}$
So what's the point of all this? Well we've shown how to eliminate the market risk. If we use the above to construct our portfolio then we will make a 1\% return in either a good or a bad market.
All that is left is to find stocks with a good alpha! Alpha is also often thought of as information. News about a company is also information.
There are two broad approaches to determining value: Fundamental vs Technical Analysis. Fundamental analysis uses metrics that are reflected on the company's balance sheet. Technical analysis focuses on the the price and volume only. Technical analysis uses these two features to build indicators or heuristics that they believe are indicative of future returns. While controversial in many circles they borrow heavily from statistical analysis.
Technical analysis is most effective when:
NB: The longer the trading horizon the more you should lean towards fundamental analysis.
A few popular indicators are:
Often time when indicators are applied to a raw price they can potentially overwhelm other indicators. In order to give them a fair treatment we normalize so that each indicator will be between -1 and 1
Normed = ( Values - Mean ) / Standard deviation
Data Aggregation There are many Exchanges with their own order books. At the lowest granular level data is represented as Ticks. A tick is a successfull buy/sell match or transaction. These happen independent of time, although they do have a time stamp, and of course different exchanges may have different ticks in the same time span. These will be consolidated into periodic chunks (minute/hour/day etc). The close is the last transaction in the periodic chunk.
Handling Stock Splits
You may notice that in some days the price of a stock drops or jumps significantly. Significant meaning far outside the normal range of deviation. This is usually the result of a stock split.
Here's an example:
On monday Stock A is trading at \$300.00, and a 4:1 stock split is executed at the end of the day.
On tuesday Stock A will open at \\$75.00, which is 300/4, also the number of outstanding shares will increase by 4
Clearly this will cause issues in our data. Left unhandled an algo might think that there was a drop.
We handle this by computing the "adjusted close". This is done by dividing the price by 4, for the data before the split. Pretty simple really. Notice that the price in the most recent data will always be the same as the adjusted close, since the most recent data will be after a split
Also notice that a reverse split is the opposite. A company might perform a 1:4 reverse split. In this case you're 4 shares become 1. Consequently the price is now 4 x Presplit price.
Handling Dividends
Many companys pay a dividend that is often reflected in the price of a stock. Upon payment the stock will almost always drop the exact same amount.
Here's an example:
On monday Stock A is trading at \$100.00, and will pay \\$1.00 in dividend per share at the end of day.
On Tuesday what do expect the share price to be? \$99.00 of course. There is another consideration when dealing with dividends. They're announced well before they are paid, their announcement will trigger a gradual price rise so that on the payment date the announced divident is priced into the market price.
Handling dividends follows the same pattern as a split. Compute the dividend as a percent, then decrease, or discount, the actual historical prices by that percentage to get the adjusted close.
Similar to splits the adjusted remains the same as the market price for dates after the dividend payment, but will diverge more and more the further back in history you go.
Survivor Bias
When developing a strategy you should aim to use survivor bias free data. This means that company's that have disappeared from the exchange are present in the historical data. A good case in point is the S&P 500. If you were to look at it today and pull the stocks you would not see the 50+ company that went bankrupt due to the 2008 financial crisis. If you built a strategy on today's S&P data it would be totally ignorant to the potential of financial distress. As you can imagine this is certainly a large gap in the development of a strategy.
Although we haven't explicitly stated it we have been using, and operating, under multiple assumptions
Where does info come from?
Three forms, or versions, of the EMH
The importance of diversification!
Developed by Richard Grinold and Ronald Kahn, the Fundamental Law of Active Management states that an active manager’s productivity depends on the quality of his/her skills and, consequently, the frequency in which the skills are applied at work. The law can also be expressed in an equation. The active manager should produce the Information Ratio (IR), which is the added value in every unit of risk added.
Grinold's Fundamental law: the relation between Skill, Performance, and breadth.
$Performance = skill * \sqrt{breadth}$
$IR = IC * \sqrt{BR}$
where
Example:
Coin flipping experiment : We bet on heads, or tails, of a flip instead of a stock. The coin is biased - like $\alpha$ 0.51 of getting heads, Uncertainty is like $\beta$
Betting: We bet N coins: Win - get 2N coins, Lose - now have 0
Environment: 1000 tables, 1000 tokens, games run in parallel
3 options
Method 1 : Expected Return
From the above it might appear that they are equivalent. What isn't apparent here is the risk or variance.
We might ask what is the probability we lose it all? Under A1 there is a .49 chance we lose it all. Under A2 the probability is (0.49^1000) ... which is incredibly small
Method 2 : Standard deviation
Method 3 : Reward/Risk
It is interesting to note as well that $20.00 = 0.63 x \sqrt{1000}$
What does this all tell us?
IR: Information Ratio
Recall that Portfolio return under CAPM $r_p(t)=\beta_p r_m(t)+\alpha_p(t)$
Suppose we wanted to compute the sharpe ratio of the skill component $\alpha_p(t)$
Then $IR = \frac{mean(\alpha_p(t))}{stdev(\alpha_p(t))} = \frac{Reward}{Risk}$
IR is often thought of as the sharpe of excess returns
IC: Information Coefficient: Is the correlation of forecasts to returns
BR: Breadth: Is the number of trading opportunties per year
$IR = IC x \sqrt{BR}$ = Performance = Skill x square_root(Breadth) What this means is that opportunities are needed. You can improve your performance by improving your skill or improving your breadth. In general improving your opportunties is much easier!!
Consider a real world case:
Warren Buffet holds 120 stocks, Simons of Rentech hedge fund holds thousands of stocks. Both have the same IR, but Simon's hedge fund algo has an IC of 1/1000 of Buffet's
How many trades must simon execute? (ANS: 120,000,000)
Why?
$IC_b \sqrt{120} = IC_s \sqrt{x}$
$IC_b \sqrt{120} = \frac{1}{1000} \sqrt{x}$
$IC_b \sqrt{120} = \frac{IC_b}{1000} \sqrt{x}$
$1000 \sqrt{120} = \sqrt{x}$
$1000^2 120 = x$
$x = 120,000,000$
Suppose you have a set of stocks you think are promising. How much of your portfolio should you invest in each? This is the question that portfolio optimization, aka Mean variance optimization, seeks to answer.
ie Given a set of equities and a target return, find an allocation to each equity that minimizes risk.
Let's pause to consider what is risk? For us we define this as the standard deviation of historical daily returns. We can visualize this relation by graphing multiple stocks on a 2 dimensional plane. Each green dot represents the return and the risk of a given entity. The orange/brown dot represents a possible scenario for some abitrary weights. This would represent the risk and return of the portfolio with the respective weights.
Can we do better than this? In fact we can! As per Harry Markowitz we can minimize risk while increasing the returns.
Consider 3 stocks: A,B, and C each with a return of 10%. Furthermore suppose A and B are positively correlated with a covariance of 0.9, meaning they move in a similar fashion. Suppose A and C are negatively correlated with a covariance of -0.9.
Sc1: 50% A and 50% B will yield 10% but will not reduce volatility Sc2:.25A, .25B, and .5C will yield 10% but the volatility will be less than any one of the equities. This tells us that blending negatively correlated stocks is a great way to reduce volatility
Mean Variance Optimization (MVO):
Inputs: Expected Return, Volatility, Covariance, Target Return ( must be between max and min equity returns )
Outputs: Asset weights for portfolio that minimize risk
The Efficient Frontier:
For any return level there is an optimal portfolio.
Until now we've focused on learners that provide a forecast. We've ignored the probability of the forecast.
Think of a robot in an some sensory environment, ie it has sensors for observation. This environment is what we call a state. It must then process this to formulate a policy, ie what it should do in such a state, this is generally denoted by pi $\pi(s)$. Once it processes the environment and determines a state then it must take action. This results in a new state. After all it's sensors will now yield new information. The cycle now repeats.
How is this policy determined? Well in an RL world each action results in a reward. For example if it's goal is to reach a point then each action that results in a distance closer to the goal is a positive reward.
How do we map the world of trading to fit an RL Problem?
Daily return can be thought of either a reward or even a state.
Markov Decision Problem
RL problems solve and MDP defined in terms of
The Goal: Is to find $\pi(s)$ that will maximize the reward. and we denote this as $\pi^*(s)$
We often don't know T nor do we know we know R. But what we can do is build and record our experiences as a series of tuples.
for example
$<s_1,a_1,s'_1, r_1>, $
$<s_2,a_2,s'_2, r_2>, $ here $s_2=s'_1$ from previous row
...
$<s_n,a_n,s'_n, r_n>, $
We do this over and over to gain experience. Now we can build T by accumulating the first three elements into a matrix, or some other tabular representation. Similarly we can accumulate, build R. Then we use Value/Policy iteration approach to build our policy and try to optimize.
Consider the following Problem.
There's 2 possible rewards 1 and 1 million. the 1 will come back and be obtained multiple times, but the 1 million is a once time only reward. If we placed a constraint like giving the robot gas for only 3 moves then the best option is to go for the one. If we gave the robot up to 8 moves they should aim for the 1 million. Now recall our future dollar value from 2.01. A dollar now is worth more than one in the future.
We can summarize our three reward scenarios as follows
Q-Learning is a model free approach, meaning it doesn't know about nor does it use transitions and rewards tables. What it does is it builds a table of utility(Q) values as it interacts with the world. Interestingly, it can be shown that Q learning is guaranteed to provide an optimal policy.
Q learning is named after the q function. Q can be thought of either as a function or a table, for our purposes we take the table approach. It has 2 dimensions (s,a) representing the state and action. Q represents the value of taking action a in state s. This is the immediate reward + the discounted reward.
Suppose we have a Q-Table, how do we use it? Well we simply take the best policy
ie $\pi(s) = argmax_a(Q[s,a])$
Now as we build this table and it gets larger and larger then we will converge to the optimal policy
ie $\pi^*(s) = Q^*[s,a]$
This doesn't tell us how we build our Q table. Here's a general algo
High Level
The key is the iteration in point #2 above which requires it's own algo
Given <s,a,s',r> how do we improve the Q table?
Part 1: $Q'[s,a] = (1-\alpha)Q[s,a] + \alpha ImprovedEstimate $ where alpha is between 0 and 1 Part 2: $Q'[s,a] = (1-\alpha)Q[s,a] + \alpha ( r x \lambda later reward ) $ where lambda is between 0 and 1 which is $Q'[s,a] = (1-\alpha)Q[s,a] + \alpha ( r x \lambda Q[s',argmax_{a'}(Q[s',a'])$
Some Finer points:
How do we translate this to Trading?
We need to represent our state as a single number. We will still with integers to keep things simple. To do this we Discretize and Combine. Discretization simply a mapping to an integer.
One simple method for discretizing into "steps"
stepsize = size(data)/steps
data.sort()
for i in range(0,step):
threshold[i] = data[(i+1)*stepsize]
Summary: Building the model
Test the model:
A significant shortcoming of Q-Learning is that it will take many experience tuples to achieve convergence. Dyna-Q enhances the Q-Learning by building a transition matrix T and a reward matrix R. After each interaction it will hullucinate hundreds or thousands of next scenarios which it thens uses to update the queue table.
Dyna-Q was developed by Richard Sutton. You'll recall that Q-Learning is model free, meaning it doesn't know T or Q.
Q-Learning algo is
Dyna-Q enhances it with the following steps
How do we learn T ? By observation of course. How many time does s and a lead to s', as a ratio to the number of times it doesn't. To perform this operation count the total number of trials and divide by the number that lead to s'
For example:
then we can evaluate T in terms of T_count: ie T[s,a,s'] = T_count[s,a,s'] / sum(T_count[s,a,:])
Learning R : $R'[s,a] = (1-\alpha)R[s,a] + \alpha * r $
Where
Verbally R is a weighted return favouring immediate rewards more than expected (future) returns
In this section we take a quick look at a particular type of Derivative securities called options. Options are contracts that confer a right to/on the buyer. This right is up to the buyer to exercise, meaning they also have the right, but not an obligation. The two main types of options are Calls and Puts.
Defined:
These options are part of a broader market class of instruments and securities called derivatives. In the market there will be buyers and sellers of each contract. The contracts for options consist of an asset that is generally 100 shares of an underlying stock, the fixed price is called the exercise, or strike, price.
There are many reasons why options exist:
Of course just as a stock is unpredicatable so are option prices difficult to pin point.
Example:
1 GM call option with a strike of \$50.00 is the right to buy 100 shares of GM for a price of \\$50.00 per share.
To complicate matters options also have an expiry date. Like most contracts it is only valid for a limited time. This makes sense as a buyer could hold an option in perpetuity.
Example:
Consider the following snapshot from Yahoo Finance taken July 31-2021, While AAPL trades for 145.86:
Suppose we buy AAPL calls on the 3rd row. What does this entail?
Let's now take a look at the profitability of these options.
Here we can see the profit loss curve for a Long call option.
We paid 200 for the option starting us in the red, this permium that we paid is our max loss. If the underlying stock goes to $52.00 we will break even meaning at 52 will gain 200 which will be offset by the premium/expense of 200. If the price of the underlying stock rises above 52, we will start to show a profit. In this case the profit is unlimited since it is a call on a underlying stock which do not have an upper limit on price.
For comparison here is a profit loss curve for a Long put option ( put with a strike at 50, and a premium of \$2.00 )
Again we are limited in the amount of loss as well as profit. Missing from this graph is the left projection. Our profit reaches a max when the stock hits 0 dollars
Things get really interesting when you start building a portfolio with various combinations of options
Stockcharts Technical Indicators
$SMA(k) = \frac{1}{k} \sum_{i=n-k+1}^n p(i)$
import pandas as pd
import numpy as np
product = {'month' : [1,2,3,4,5,6,7,8,9,10,11,12],'demand':[290,260,288,300,310,303,329,340,316,330,308,310]}
df = pd.DataFrame(product)
# 10 day simple moving average
df['LAG10SMA'] = df.iloc[:,1].rolling(window=10).mean()
Aka EWMA Exponential Weighted Moving average
$EMA(t) = (1-\alpha)EMA(t-1)+\alpha p(t)$
# 20 day EMA using Pandas
# adjust=False specifies that we are interested in the recursive calculation mode.
ema_short = data.ewm(span=20, adjust=False).mean()
Is a trend Following Momentum Indicator that shows the relationship between two moving averages of a securities price. It is calculated by subtracting the EMA(26day) from EMA(12day). ie ($EMA(12)-EMA(26)$)
Composed of 3 lines to create 2 bands
Notes
https://school.stockcharts.com/doku.php?id=technical_indicators:relative_strength_index_rsi
$RSI = 100 - \left[ \frac{100}{1+RS} \right]$ where $RS=\frac{AvgGain}{AvgLoss}$
This is done in a 2 Part calculation
Part 1 : For the first 14 days
Avg gain = sum of gains over the last 14 days / 14
Avg loss = sum of lossess over the last 14 days / 14
Part 2 : For subsequent days
Avg Gain = (PrevAvgGain x 13 + Current Gain ) / 14
#https://stackoverflow.com/questions/20526414/relative-strength-index-in-python-pandas
delta = price['Close'].diff()
dUp, dDown = delta.copy(), delta.copy()
dUp[dUp < 0] = 0
dDown[dDown > 0] = 0
RolUp = pd.rolling_mean(dUp, n)
RolDown = pd.rolling_mean(dDown, n).abs()
RS = RolUp / RolDown
rsi= 100.0 - (100.0 / (1.0 + RS))
# Alternatively using the EMA
## Make the positive gains (up) and negative gains (down) Series
#up, down = delta.clip(lower=0), delta.clip(upper=0)
## Calculate the EWMA
#roll_up = up.ewm(span=window_length).mean()
#roll_dn = down.abs().ewm(span=window_length).mean()
## Calculate the RSI based on EWMA
#RS = roll_up / roll_dn
#RSI = 100.0 - (100.0 / (1.0 + RS))
MoneyFlowIDX = 100 - ( 100 / (1 + MoneyFlowRatio))
Where
MoneyFlowRatio = 14Day_Positive_MoneyFlow / 14Day_Negative_MoneyFlow
RawMoneyFlow = Typical_Price x Volume
Typical_Price = (High + low + Close) / 3
Algo
Inputs: A short and long moving average (can be of type EMA or SMA)
BUY => when short MA moves above the long MA
SELL => when short MA moves below the longMA
Example: try 2 simple moving average with short=50day and long=200day
Inputs: Signal Line = EMA(9day), MACD Line = (EMA12-EMA26) BUY => When MACD crosses the Signal from below, ie moves above the signal SELL => When MACD crosses the Signal from above, ie moves below the signal