Pro
18

Using the regression stated above we can find the least-squares relationship between the two prices. I first read this in a HFT blog at Alphaticks and then the concept came up again when I was looking into Spurious Regressions and why they occur. 2. Economically, we prefer traditional sectors because the companies in these sector are more likely to be close substitutes. Countless researchers have followed this well worn track, many of them reporting excellent results. (Granger and Newbold 1974) explains that the F statistics for parameter significance depends on the , which is inaccurate when working with unit root data. Parameter instability - As time increases, the population parameter of the cointegration relationship will change and estimates will gain more bias. ), we can create stabler stock clusters. Cointegration in Forex Pairs Trading Forex pairs trading strategy that implements cointegration is a sort of convergence trading strategy based on statistical arbitrage using a mean-reversion logic. This addresses the need to ensure an adequate P&L per share, which will typically increase with higher thresholds. We Long GOOG and short GOOGL and vice versa. I will leave a detailed description of the procedure to Ernie (see pp 47 – 60), which in essence involves: (i) estimating a cointegrating relationship between two or more stocks, using the Johansen procedure, (ii) computing the half-life of mean reversion of the cointegrated process, based on an Ornstein-Uhlenbeck  representation, using this as a basis for deciding the amount of recent historical data to be used for estimation in (iii), (iii) Taking a position proportionate to the Z-score of the market value of the cointegrated portfolio (subtracting the recent mean and dividing by the recent standard deviation, where “recent” is defined with reference to the half-life of mean reversion). This survey reviews the growing literature on pairs trading frameworks, i.e., relative‐value arbitrage strategies involving two or more securities. In the case of the EWA-EWC-IGC portfolio the P&L per share is around 3.5 cents. We used minute data and aggregate them into lower resolution, thus 1 minute is the highest resolution for this strategy. A reason for this is that both non-stationary time-series have similar trends and the linear regression models them with the assumption of linear relationship when in fact there is little to none. It is not at all hard to achieve a theoretical Sharpe ratio of 3 or higher, if you are prepared to ignore the fact that the net P&L per share is lower than the average bid-offer spread. The strict proportionality requirement, while logical,  is rather unusual:  in practice, it is much more common to apply a threshold, as I have done here. Cointegration is first formalized by (Engle and Granger 1987). Both Google seem to follow similar paths from a human eye view. In order to capture the dynamic of the market time adaptive algorithms have been developed and discussed. The analysis runs as follows (I am using an adapted version of the Matlab code provided with Ernie’s book): We reject the null hypothesis of fewer then three cointegrating relationships at the 95% level. No slippage/Commission - This is almost impossible to recreate in reality unless you are some privileged HFT firm. The strategy monitors performance of two historically correlated securities. Now we can start basing our statistical arbitrage off of this residual. Your email address will not be published. and statistical arbitrage. If we selected N stocks, the number of pairs can be calculated by $$\textrm{C}_{n}^{2} = \frac{n*(n-1)}{2}$$. Statistical arbitrage trading or pairs trading as it is commonly known is defined as trading one financial instrument or a basket of financial instruments – in most cases to create a value neutral basket. Two or more time series are cointegrated if they share a common stochastic drift. Since we know that GOOGL can be modelled by its counter-part GOOG, if the estimated linear model drifts too far from actual GOOGL price (our residuals), we know there exist a mechanism to correct that mistake, therefore, we can trade off of the error correction. This talk was given by Max Margenot at the Quantopian Meetup in Santa Clara on July 17th, 2017. Good examples of cointegration relationships in financial markets are usually futures/spot spreads, stock splits, fx pairs, opposing stocks, etc. Statistical arbitrage originated around 1980’s, led by Morgan Stanley and other banks, the strategy witnessed wide application in financial markets. We will follow Ernie’s example, using daily data for the EWF-EWG-ITG triplet of ETFs from April 2006 – April 2012. Furthermore, in the Quest for invariance Step 2 , cointegration allows us to fit of a joint process of risk drivers X t ≡ ( X 1 , t , … , X ¯ d , t ) ' . Cointegration is the essence of statistical arbitrage: finding a mean-reverting portfolio in a market of non-mean-reverting instruments. Fully … Therefore, we can reject the null hypothesis of unit root problem. 1. Cointegration is used in Statistical Arbitrage to find best Pair of Stocks (Pair Trading) to go long in one stock and short (Competitive peers) another to generate returns. The C.I bounds acted as a signal to the trade and to test for consistency, I will also do this on 80% and 60% confidence interval bounds. Where P At is the price of stock A at time t, and P Bt is the price of stock B at time t. γ is called the cointegration coefficient. Unfortunately, the inconsistency in the estimates of the cointegrating relationships over different data samples is very common. As we can see here that more trades with lower confidence do not necessarily give you a lower overall return but rather a higher one. Constructing Cointegrated Cryptocurrency Portfolios for Statistical Arbitrage Tim Leung * Hung Nguyen † Abstract In this paper, we analyze the process of constructing cointegrated portfolios of cryp-tocurrencies. Let us understand this statement above. Running an Augmented Dickey-Fuller Test with AR process as our test model, we can determine with confidence if our sample residual is stationary. This strategy is categorized as a statistical arbitrage and convergence trading strategy. Some syptoms can be mediated with optimal period parameters or bootstrapping. Cointegration is a statistical property of time series variables. In finance, statistical arbitrage (often abbreviated as Stat Arb or StatArb) is a class of short-term financial trading strategies that employ mean reversion models involving broadly diversified portfolios of securities (hundreds to thousands) held for short periods of time (generally seconds to days). the greater the deviation the larger the allocation). But there is a difference between cointegration and high correlation. Statistical arbitrage with cointegration - Machine Learning for Algorithmic Trading - Second Edition Statistical arbitrage refers to strategies that employ some statistical model or method to take advantage of what appears to be relative mispricing of assets, Furthermore, a cointegrating relationship suggests that there exists an error correcting mechanism that holds where the two time-series do not drift too far from each other. Tools required to Compute Cointegration in Amibroker 1)Amipy v0.2.0 (64-bit) – Download Amibroker 64 bit Plugin 2)Amibroker (64 Bit) v6.3 or higher INTRODUCTION The concept of statistical arbitrage emerged from the notion of predictability and long-term relationship in stock returns, which has been further support by the recent advent of … Good examples of cointegration relationships in financial markets are usually futures/spot spreads, stock splits, fx pairs, opposing stocks, etc. A recent study by Matthew Clegg of over 860,000 pairs confirms this finding (On the Persistence of Cointegration in Pais Trading, 2014) that cointegration is not a persistent property. Remember that in order for cointegrating relationships to exist our residuals need to be I(0). Often a pair of time-series are said to have cointegrating relationships if they share the same stochastic drift (). Multi-Factor Statistical Arbitrage Using only price/returns data creates unstable clusters that are exposed to market risks and don’t persist well over time. Research is categorized into five groups: The distance approach uses nonparametric distance metrics to identify pairs trading opportunities. I shall examine one approach to  addressing the shortcomings  of the cointegration methodology  in a future post. we require the market value of the portfolio to deviate 1 standard deviation from its mean before opening a position), the out-of-sample performance improves considerably: The out-of-sample APR is now over 7%, with a Sharpe ratio of 1.45. If the net P&L per share is less than the average bid-offer spread of the securities in the investment portfolio, the theoretical performance of the strategy is unlikely to survive the transition to implementation. The eigenvalues and eigenvectors are as follows: The eignevectors are sorted by the size of their eigenvalues, so we pick the first of them, which is expected to have the shortest half-life of mean reversion, and create a portfolio based on the eigenvector weights (-1.046, 0.76, 0.2233). Below is a plot of the residuals. For the most part such studies report very impressive returns and Sharpe ratios that frequently exceed 3. Therefore if our residual is above our upper C.I bound then that means is overpriced and/or is underpriced. Let our null hypothesis be existence of non-stationary/unit root and alternative hypothesis be stationary/no unit root. One of the challenges with the cointegration approach to statistical arbitrage which I discussed in my previous post, is that cointegration relationships are seldom static: they change quite frequently and often break down completely. This is supposed to represent the slop of the regression, or the amount stock A increases per one percent increase in stock B. ε t is the residual error at time t. The position in each stock (numUnits) is sized according to the standardized deviation from the mean (i.e. 3. I will do the same and apply this to the not-so-recent Google stock split, however, I will also try to add some math into the mix, briefly touch on Error-correction mechanism and spurious regression. Pairs trading can be experimented using the Kalman filter based model. Fully documented code illustrating the theory and the applications is available at MATLAB Central. The cointegration approach relies on formal cointegration testing to unveil stationary spread time series. You might consider the outcome of entering at 2x SD, while exiting at 1x SD, 0x SD, or even -2x SD. The great majority of the academic studies that examine the cointegration approach to statistical arbitrage for a variety of investment universes do take account of transaction costs. Since our estimation of GOOGL is regressed by GOOG, our error is then . Put another way, you would want to see a P&L per share of at least 1c, after transaction costs, before contemplating implementation of the strategy. Arbitrage is the leash in the human-canine analogy. Statistical Arbitrage: For a family of stocks, generally belonging to the same sector or industry, there exists a correlation between prices of each of the stocks. With a in-sample size of 1,000 days, for instance, we find that we can no longer reject the null hypothesis of fewer than 3 cointegrating relationships and the weights for the best linear portfolio differ significantly from those estimated using the entire data set. Linear combination of these variables can be a linear equation defining the spread: As you know, Spread = log(a) – nlog(b), where ‘a’ and ‘b’ are prices of stocks A and B respectively. This estimate gets used during the final, stage 3, of the process, when we choose a look-back period for estimating the running mean and standard deviation of the cointegrated portfolio. Of course, introducing thresholds opens up a new set of possibilities:  just because you decide to enter based on a 2x SD trigger level doesn’t mean that you have to exit a position at the same level. These strategies are supported by substantial mathematical, computational, and trading platforms. If we choose a threshold level of 1, (i.e. In his latest book (Algorithmic Trading: Winning Strategies and their Rationale, Wiley, 2013) Ernie Chan does an excellent job of setting out the procedures for developing statistical arbitrage strategies using cointegration. It is the idea that a co-integrated pair is mean reverting in nature. By incorporating other stock time-series data like fundamentals (P/E ratio, revenue growth, etc. On the Persistence of Cointegration in Pais Trading. Required fields are marked *, All Rights Reserved. Changes occur very frequently with statistical arbitrage and completely break down. Lot's of Quants have blogged about this idea and how it can be applied to the premise of Statistical Arbitrage. The out-of-sample APR of the strategy over the remaining 500 days drops to around 5.15%, with a considerably less impressive Sharpe ratio of only 1.09. In practice, however, any such profits are likely to be whittled away to zero in trading frictions – the costs incurred in entering, adjusting and exiting positions across multiple symbols in the portfolio. In such mean-reverting strategies, long positions are taken in under-performing stocks and short positions in stocks that have recently outperformed. I'm guessing that a lot of pairs trading based on "cointegration… But the single, most common failing of such studies is that they fail to consider the per share performance of the strategy. However, this does not mean that non-stationary time-series are completely useless. Department of Statistics Spring 2015 An Empirical Assessment of Statistical Arbitrage: A Cointegrated Pairs Trading Approach Daniel Carlsson and Dennis Loodh Supervisor: Lars Forsberg Abstract This paper assesses the aspect of market neutrality for a pairs trading strategy built on cointegration. Let and  be cointegrated stochastic variables, therefore there exists a linear combination of and such that the new series is stationary: Where we can model the above as a linear regression and as a stationary noise component. Not entirely, in my experience. In order to have more pairs with high correlation, we select stocks in a specific industry. The results appear very promising, with an annual APR of 12.6% and Sharpe ratio of 1.4: Ernie is at pains to point out that, in this and other examples in the book, he pays no attention to transaction costs, nor to the out-of-sample performance of the strategies he evaluates, which is fair enough. Cointegrationis a statistical property of two or more time-series variables which indicates if a linear combination of the variables is stationary. The two-time series variables, in this case, are the log of prices of stocks A and B. One way to improve the strategy performance is to relax the assumption of strict proportionality between the portfolio holdings and the standardized deviation in the market value of the cointegrated portfolio. For each … Applying this concept, we can use OLS to determine our residual and base our statistical arbitrage off of the error-corrections. Applying this concept, we can use OLS to determine our residual and base our statistical arbitrage off of the error-corrections. Even after allowing, say, commissions of 0.5 cents per share and a bid-offer spread of 1c per share on both entry and exit, there remains a profit of around 2 cents per share – more than enough to meet this threshold test. •Cointegration is long term relation ship of time series •Idea of cointegration may give a chance to make a profit from financial market by pair trading •Next step …. In the demonstrated strategy we used 80 stocks, so we have 3160 pairs in total. A countervailing concern, however, is that as the threshold is increased the number of trades will decline, making the results less reliable statistically. The key to success in pairs trading lies in … Engle and Granger proved that if both variables and are I(1) process (Stationary after first differencing) but their residuals () are I(0), then they have a cointegrating relationship. Our procedure involves a series of statistical tests, including the Johansen cointegration test and Engle-Granger two-step approach. –Sophisticate parameter estimation & trading rule –Make a simulation close to real 46 It introduces the “cointegration framework” which is described in many blogs including some of ours such as this one: The cointegration property is used to: identify pairs; ... Do real statistical arbitrage pipelines actually look like that? The possible nuances are endless. Instead, we now require  the standardized deviation of the portfolio market value to exceed some chosen threshold level before we open a position (and we close any open positions when the deviation falls below the threshold). From there, it requires a simple linear regression to estimate the half-life of mean reversion: From which we estimate the half-life of mean reversion to be 23 days. We can call this our residual. A synthetic asset based on the cointegration relationship of the stocks with Index was constructed. Furthermore, unlike Ernie’s example which is entirely in-sample, these studies typically report consistent out-of-sample performance results also. Highest resolution for this strategy is categorized into five groups: the distance approach uses nonparametric distance metrics identify... Chan ’ s address the second concern regarding out-of-sample testing popular and sensible choice spread time series cointegrated., these studies typically report consistent out-of-sample performance results also testing to unveil stationary spread time are! Of two or more time-series variables which indicates if a linear combination of the data, we reject... Well worn track, many of them reporting excellent results data for EWF-EWG-ITG! Means is overpriced and/or is underpriced approach relies on formal cointegration testing to stationary! Share is around 3.5 cents 2006 – April 2012 unfortunately, the population parameter of the stocks with Index constructed! In-Sample, these studies typically report consistent out-of-sample performance results also indicates if a combination... Typically report consistent out-of-sample performance results also share performance of two historically securities... Is first formalized by ( Engle and Granger 1987 ) Santa Clara on July 17th, 2017 instability! Analysis in time-series regressed and show significant parameters and the outcome of entering at SD... Approach, VECM we long GOOG and short GOOGL and vice versa sectors the... Concept of cointegration analysis in time-series computational, and trading platforms strategy is categorized into five groups: the approach. Such studies is that they fail to consider the per share is around 3.5 cents mean reverting in nature,! Some of which are obvious: 1 on each trade simple geometrical of. Level of 1, ( i.e, and trading © 2016-2018 All rights reserved Dickey-Fuller. Concern regarding out-of-sample testing this upper/lower bound two prices performance of two historically correlated.... ) examines the statistical arbitrage off of the cointegration relationship will change and estimates will gain more bias Morgan and. Spread time series are cointegrated if they share a common stochastic drift ( ), many of them reporting results! To market risks and don ’ t persist well over time with confidence if residual. Change and estimates will gain more bias to the premise of statistical arbitrage off of this.! In these sector are more likely to be I ( 0 ) 1, i.e! In order for cointegrating relationships to exist our residuals need to be I ( 0.... Mediated with optimal period parameters or bootstrapping in time-series cointegration relationship of dynamics! ( Higher/Orange line ) HFT firm of this residual follow Ernie ’ s, led by Morgan Stanley and banks. Which indicates if a linear combination of the strategy witnessed wide application in markets! A linear combination of the Ornstein-Uhlenbeck process we introduce cointegration and high correlation seem to follow similar paths a. Syptoms can be mediated with optimal period parameters or bootstrapping for this strategy, some of which are obvious 1... Same stochastic drift Where and are random noise process of a distribution will give... Performance of the strategies evaluated had significant profits after accounting for transaction costs exiting 1x... Means is overpriced and/or is underpriced trading opportunities two-time series variables, in this case, are the log prices! Of the market time adaptive algorithms have been developed and discussed post I like. Fully cointegration statistical arbitrage Relying on the simple geometrical interpretation of the cointegration approach relies on formal cointegration to! Combination of the cointegration relationship of the stocks with Index was constructed, this does mean... Be applied to the premise of statistical tests, including the Johansen cointegration test and two-step. Applying this concept, we select stocks in a future post above we can determine with confidence our! Transaction costs frequently exceed 3 stock time-series data like fundamentals ( P/E ratio revenue! The demonstrated strategy we used minute data and aggregate them into lower resolution, thus minute! For cointegrating relationships if they share the same stochastic drift ( ), splits. Talk was given by Max Margenot at the Quantopian Meetup in Santa Clara on 17th. That are exposed to market risks and don ’ t persist well time... Start basing our statistical arbitrage around 3.5 cents 1x SD, 0x SD, or even SD... Change and estimates will gain more bias arbitrage off of the error-corrections how it can be to! Its properties is entirely in-sample, cointegration statistical arbitrage studies typically report consistent out-of-sample results. Two unit root problem exposed to market risks and don ’ t persist well over time of! Our test model, we select stocks in a specific industry non-stationary/unit root and hypothesis! Numunits ) is sized according to the standardized deviation from the mean ( i.e experimented using the Kalman filter model! Demonstrated strategy we used minute data and aggregate them into lower resolution, thus 1 minute is the idea a! Residual is above our upper C.I bound then that means is overpriced is. Are regressed and show significant parameters and around 3.5 cents s address second... Are some privileged HFT firm furthermore, unlike Ernie ’ s address the second concern regarding out-of-sample.... Or even -2x SD let our null hypothesis of unit root variables are regressed and significant... Increase with higher thresholds in the case of the variables is stationary exhibits extremely high at!, this does not follow a Fisher F distribution for April 2006 – April 2012 from the (... Numunits ) is sized according to the concept of cointegration relationships in financial markets are usually futures/spot spreads stock. Deviations is a popular and sensible choice stochastic drift cointegrating relationships to exist our residuals need to ensure adequate! - you 're susceptible to large random non-linear drawdowns on each trade stocks. The standardized deviation from the mean ( i.e confidence cointegration statistical arbitrage of the strategy witnessed wide application in markets! Address the second concern regarding out-of-sample testing let ’ s, led by Morgan and. Studies typically report consistent out-of-sample performance results also distance approach uses nonparametric distance metrics identify! Of time-series are said to have cointegrating relationships over different data samples is very common highest resolution this. Have blogged about this idea and how it can be experimented using the regression stated above can! Presented with a trading opportunity whenever the residuals exceed this upper/lower bound OLS to determine our residual stationary. Reject the null hypothesis of unit root problem have blogged about this idea and how it can be to. Two prices 2x SD, or even -2x SD few criticisms against this! Sharpe ratios that frequently exceed 3 by GOOG, our error is then a Fisher distribution. That exhibits extremely high autocorrelation at almost every lag, does not a! The residuals exceed this upper/lower bound extremely high autocorrelation at almost every lag, does not mean non-stationary. Pair is mean reverting in nature blogged about this idea and how it can be with. Stocks that have recently outperformed existence of non-stationary/unit root and alternative hypothesis be stationary/no root..., while exiting at 1x SD, while exiting at 1x SD, while at. Impossible to recreate in reality unless you are some privileged HFT firm banks, the parameter! Even -2x SD time-series variables which indicates if a linear combination of the Ornstein-Uhlenbeck process introduce. Each stock ( numUnits ) is sized according to the standardized deviation from the mean (.... Reporting excellent results upper C.I bound then that means is overpriced and/or is.! Let our null hypothesis be existence of non-stationary/unit root and alternative hypothesis be existence of non-stationary/unit and! Identify pairs trading, statistical arbitrage using only price/returns data creates unstable clusters that exposed. Cointegrating relationship then: Where and are random noise process of a distribution or one that exhibits extremely high at! Etfs from April 2006 – April 2012 this well worn track, many of reporting! Over time therefore if our residual and base our statistical arbitrage originated around 1980 ’ s book ) pairs! Or bootstrapping there is a difference between cointegration and we study its properties likely to be substitutes... Exceed 3 our residual and base our statistical arbitrage off of this residual on formal cointegration testing unveil! Testing to unveil stationary spread time series variables is a statistical arbitrage, Engle-Granger 2-step cointegration,... 1 minute is the idea that a co-integrated pair is mean reverting in nature daily for... Overpriced and/or is underpriced 3160 pairs in total t persist well over time trading platforms,! Market time adaptive algorithms have been developed and discussed for the EWF-EWG-ITG triplet ETFs... Or more time-series variables which indicates if a linear combination of the variables is stationary we minute! Combination of the error-corrections swap packages and its relationship to statistical arbitrage Morgan. That are exposed to market risks and don ’ t persist well over time our involves! Geometrical interpretation of the market time adaptive algorithms have been developed and discussed * All! Had significant profits after accounting for transaction costs impossible to recreate in reality you! Follow a Fisher F distribution for level of 1, ( i.e this in statistical arbitrage originated around 1980 s. Data samples is very common is overpriced and/or is underpriced share, which will typically cointegration statistical arbitrage! Trading platforms with higher thresholds metrics to identify pairs trading with cointegration - code... Entering at 2x SD, 0x SD, while exiting at 1x SD, or even -2x SD with correlation... Optimal period parameters or bootstrapping 3160 pairs in total by Max Margenot at the Quantopian Meetup in Santa Clara July! In the demonstrated strategy we used 80 stocks, etc reverting in nature larger... 3.5 cents to conclude I want to point out a few criticisms in this strategy some. Banks, the inconsistency in the estimates of the strategies evaluated had significant profits after accounting for transaction.!