1. For starters

In my doctoral thesis I analyze the cross-border spillovers from nationally implemented macroprudential policy via international lending with the help of a gravity model applied to international banking asset holdings.

These notes contain information about the gravity equation, it’s theoretical foundations, applications and estimation.

The gravity model has been a workhorse model of international trade analysis for decades. My interest is however in applying the gravity model to international banking. This should be kept in mind when reading the notes.

My GitHub repository for gravity equations can be found at github.com/anorring/Gravity-equations.

2. From A Practical Guide to Trade Policy Analysis

Chapter 3: Analyzing bilateral trade using the gravity equation

The WTO and UNCTAD have co-published an on-line book with a chapter focused on the practical use of gravity models. You can find the chapter here.

2.1 Analytical tools

2.1.1 The gravity equation: theoretical equation

From empirics to theory:

the stylized fact of trade decreasing with distance first noted by Tinbergen (1962)
empirical analysis predated theory: initially gravity equation was thought to be a mere representation of trade, and that the Ricardian and the Heckscher-Ohlin models were unable to provide a theoretical foundation
it turned out that a gravity equation arises from a range of different trade theories: Anderson (1979) provided a first attempt, Bergstrand (1985, 1989) derived it from Krugman’s model of trade based on monopolistic competition, Deardorff (1998) from a traditional factor-proportions explanation of trade, Eaton and Kortum (2002) from a Ricardian type of model, and Helpman et al. (2008) and Chaney (2008) from a model of international trade in defferentiated goods with firm heterogeneity

The gravity equation in the general, multiplicative formulation:

\(X_{ij} = G S_i M_j \phi_{ij}\)

where \(X_{ij}\) is the monetary value of exports from coutry i to country j, \(S_i\) and \(M_j\) denote all the exporter- and importer-specific factors (such as the GDP) that make the total of exporter country’s willingness to supply and the importer country’s willingness to demand. \(G\) is a variable independent of the countries in the bilateral pair (such as the level of world liberalization). \(\phi_{ij}\) represent the ease of exporter access to the importer’s market, i.e. the inverse of bilateral trade costs.

The importance of the “multilateral trade-resistance” term - Anderson and van Wincoop (2003):

the importance of deriving specifications and variables from economic theory for obtaining the proper inferences from estimations
controlling for relative trade costs crucial: the propensity of trade from i to j is determined by trade costs of i w.r.t. j relative to the average trade resistance to imports of j and the resistance to exporters in i
that is, two bordering countries bordering also larger trading economies will trade less between each other than they otherwise would (i.e. Belgium and Netherlands trade less with each other than Australia and New Zealand)

Thus the gravity equation takes the following form:

\(X_{ij} = \frac{Y_i Y_j}{Y} ( \frac{t_{ij}}{\Pi_i P_j} )^{1 - \sigma}\)

where \(Y\) denotes world GDP, \(Y_i\) and \(Y_j\) the GDP of countries i and j respectively, \(t_{ij}\) (one plus the tariff equivalent of overall trade costs) is the cost in j of importing a good from i, \(\sigma\) > 1 is the elasticity of substitution and \(\Pi_i\) and \(P_j\) represent exporter and importer ease of market access or country i’s outward and country j’s inward multilateral resistance terms. The multilateral resistance terms are low if a country is remote from world markets (remoteness can mean physical distance or other trade costs).

Note that all variables in the above formulation can vary across time.

2.1.2 Estimation methods

2.1.2.1 The simplistic approach: log-linearization and OLS-estimation

The standard procedure has long been simply taking natural logs of the equation and estimate the obtained log-linear equations using OLS. Consider the version of the model that includes MRT:

\(ln X_{ijt} = a_0 + a_1 ln Y_{it} + a_2 ln Y_{jt} + a_3 ln t_{ijt} + a_4 ln \Pi_{it} + a_5 ln P_{jt} + \varepsilon_{ijt}\)

that is, the logged gravity equation relates the log of bilateral trade to the logs of GDP’s, a composite term measuring barriers and incentives to trade between the pair, and terms measuring the barriers to trade between i and j and the rest of the world
interpretation easy: the parameters of a log-linear equation are elasticities, so the effects indicate the percentage variation in trade following a 1 per cent increase in any of the controls

Variables that capture trade costs:

bilateral distance
dummies for transport costs: islands, landlocked countries, common borders
dummies for information costs: common language, common border, common colonial history
dummies for tariff barriers: regional trade agreements

Variables that capture MTR-terms:

not observable
the most common method: use country fixed effects for both importers and exporters
other methods: estimating price-raising effects of barriers to trade, constructing a remoteness variable, using regional dummies

2.1.2.2 Controlling for the multilateral trade resistance

Case A: Interest of the research focuses on the coefficient of a bilateral variable

Use the time-varying country dummies for importer and exporter countries to capture all country-specific characteristics. This will control for a country’s overall level of imports/exports:

\(ln X_{ijt} = a_0 + a_1 I_{it} + a_2 I_{jt} + a_3 ln t_{ijt} + a_4 I_{t} + u_{ijt}\)

where \(I_t\) is a time dummy independent of the country pair and \(I_{nt}\) are exporter/importer time-varying individual effects (thus they can take into account the fact that MRT can change over time).

Trade costs are usually assumed to take the form:

\(t_{ij} = d_{ij}^{\delta_1}*exp(\delta_2 cont_{ij} + \delta_3 lang_{ij} + \delta_4 ccol_{ij} + \delta_5 col_{ij} + \delta_6 landlock_{ij} + \delta_7 RTA_{ij})\)

where \(d_{ij}\) is bilateral distance and the other controls are dummies for a common border, common language, common colonizer, colonial relation, either country being landlocked or members of a regional trade agreement. All these variables have been found to have a statistically significant effect on bilateral trade.

#In STATA:

#tab (year), gen (year_)
#gen impyear = group(importer year)
#gen expyear = group(exporter year)
#tab (impyear), gen (impyear__)
#tab (expyear), gen (expyear__)
#xtreg lnexports lndist cont lang ccol col landlock RTA impyear_* expyear_* year_*, robust

#or with random effects - check suitability with the Hausman test:

#xtreg lnexports lndist cont lang ccol col landlock RTA impyear_* expyear_* year_*, re robust

NB: The observations usually fitted to the gravity equation exhibit heterogeneity arising from different sources. The error term is highly unlikely to be homoskedastic, and thus heteroskedasticity robust standard errors should be used systematically (option [robust] in [reg]). One should often also control for clustering by using option [cluster]. (Outliers can pose problems: you can find them with the Hadi test [hadimvo])

Controlling for country-pair effects: you can use fixed effects if the independent country-pair variables are time-varying (such as membership of a common RTA). Using fixed effects with time-invariant controls results in perfect collinearity:

#In STATA:

#tab (year), gen (year_)
#gen impyear = group(importer year)
#gen expyear = group(exporter year)
#tab (impyear), gen (impyear__)
#tab (expyear), gen (expyear__)

#xtreg lnexports RTA impyear_* expyear_* year_*, fe robust

Case B: Interest of the research focuses on the country-specific variable (relevant for MY THESIS)

Using the country fixed effects makes it impossible to estimate the partial effects of country-specific explanatory variables, because they are perfectly collinear with time-varying country fixed effects. This problem arrises e.g. when we want to know what the effect of the size of the economies is on bilateral trade, or of the differencies in policies. Then there are two options: use time-invariant country fixed effects or control for MTR using a different method.

Consider first the use of time-invariant country fixed effects and additional country-specific controls, when we are interested in the effect of the economic mass on trade:

\(ln X_{ijt} = a_0 + a_1 ln GDP_{it} + a_2 ln GDP_{jt} + a_3 ln t_{ijt} + a_4 I_{i} + a_5 I_{j} + a_6 I_{t} + u_{ijt}\)

This approach can be used, when the sample period is relatively short: the idea is that the MTR’s do not vary very much over a short sample period (see Baldwin and Taglioni, 2006). It is not optimal, because we do know that the MTR’s are not time-invariant as e.g. the geographical composition of a country’s trade varies. The upside is that one can add many control variables and other variables of interest to the basic gravity equation, such as the quality on insitutions or infrastructure.

#In STATA:

#gen lnGDPexp= ln (GDPexp)
#gen lnGDPimp= ln (GDPimp)
#xtreg lnexports lnGDPexp lnGDPimp lndist importer_* exporter_* year_*, robust

Another option is to estimate the MTR by measuring remoteness. There are multiple ways of doing this:

a simplistic measure of a country’s average weighted distance from its trading partners (Head, 2003) - problems: not correct theoretically, requirement for including the auto-distance of the country
a non-linear procedure (Anderson and van Wincoop, 2003)
estimating a linear approximation (first order Taylor approximation) (Baier and Bergstrand, 2009)

#The simple remoteness measure in STATA:

#Compute the share of GDP in world GDP
#bys exporter year: egen gdptotal = sum(gdp)
#gen gdpshare = gdp / gdptotal

#Compute the spatially weighted GDP share
#bys exporter year: egen remoteness = total(dist*gdpshare)

#Compute the spatially weighted GDP share according to Head (2003)
#bys exporter year: egen Remoteness_head=total(dist/gdpshare)

Take note of the three typical mistakes in the traditional gravity set-ups:

omitting multilateral resistance term or “remoteness” -> omitted variable bias due to omitted terms correlated with trade costs
using net trade - trade flows should be treated separately each way
inappropriate deflation of trade flows - taken care by time dummies or country effects

2.1.3 Advanced gravity modelling issues

2.1.3.1 The issue of zero-trade flows

Relevant for MY THESIS

The three traditional approaches to handling zero trade:

1. Truncate the sample by dropping zero observations - the usual strategy of log-linearising the model does this

This is correct if zeros are randomly distributed, i.e. not informative. This however hardly ever the case: zero trade might reflect prohibitively high transportation costs.

2. Add a small constant, say 1 dollar, to the value of trade before taking logs

Incorrect: an ad hoc procedure, with no guarantee of it reflecting the underlying expected values -> inconsistent estimates

3. Estimate the model in levels, i.e. in the multiplicative form

Cannot be estimated with OLS.

What estimator should be used with zero trade flows?

Depends on the assumed nature of the zeros: are they missing observations recorded as zeros or the result of firms’ decision not to export?

Reporting errors

Should be omitted or interpolated using the values before and after the observation, when there is a stable trend
What is an error? Plot the whole time series: a zero in an otherwise positive series is likely to be an error
NB: the number of reporting errors usually inversely correlated with income per capita

Tobit

Sometimes used, but likely to be inapproapriate

PPML - Pseudo Poisson maximum likelihood method

Applicable to the levels of trade, thus estimating directly the non-linear form of the gravity model and avoiding omitting zero observations
Santos Silva and Tenreyro (2006) highlight that this method is also robust in the presence of heteroskedasticity, as is common in trade data

#In STATA (though why not the ppml command):

#option fe controls for country-pair fixed effects

#xtpoisson exports lndist cont lang ccol col landlock RTA exporter_* importer_*, fe, robust

Selection model à la Heckman

Likely: the probability of finding a positive trade flow between two countries is correlated with unobserved characteristics of that country pair. Then zero trade arrises from the firms’ decisions not to export to a certain market.
Appropriate procedure: model these decision and correct the estimation on the volume of trade for this selection bias
Crucial: the results are correct only if we can identify a variable that explains the likelihood that firms export to a certain market, but (given that the trade is positive) not the amount of trade

Notes on gravity equations

Anni Norring

2018

1. For starters

2. From A Practical Guide to Trade Policy Analysis

2.1 Analytical tools

2.1.1 The gravity equation: theoretical equation

2.1.2 Estimation methods

2.1.2.1 The simplistic approach: log-linearization and OLS-estimation

2.1.2.2 Controlling for the multilateral trade resistance

2.1.3 Advanced gravity modelling issues

2.1.3.1 The issue of zero-trade flows

2.1.3.2 Zero trade and heterogeneity

2.1.4 Data sources

2.2 Applications

2.2.1 Building a database and estimating a gravity model

2.2.2 Measuring the effect of NTBs

2.3 Exercises

2.3.1 Estimating the impact of a regional trade agreement

2.3.2 Calculating tariff equivalent