John Beieler

PhD Student in Political Science

Terrorism Studies and Prediction

This semester I’m taking a class on terrorism. Overall I’ve found the class very enjoyable; the topic of political violence is one that is always fascinating. With that said, I found one issue that would repeatedly pop up in the readings for the seminar. Almost every paper would demarcate some type of theory, discuss the data used, and run some statistical tests, which is pretty standard social science research. The issue arose in the “Discussion” or “Conclusion” sections. Almost invariably the authors would discuss the practical implications of their research, which is fine until the dreaded prediction word appears. Then claims about the predictive accuracy of the models used in the paper would rear their ugly heads. These models were explicitly not predictive models. This became the soap box that I would drag out repeatedly throughout the semester. Finally, I decided to put my own assertions to the test and see how some models performed on out-of-sample predictive tests.

The Paper

The week I chose to perform this analysis we discussed papers that focused on counterterrorism policies. One paper was “Foreign Aid Versus Military Intervention in the War on Terror” (behind a paywall) by Jean-Paul Azam and Veronique Thelen. A full discussion of their theory is beyond the scope of this post, but the authors’ basic argument is that foreign investment can reduce the number of transnational terrorist events for a number of reasons. They test this general assertion using a variety of models. Luckily, they provide replication data for their paper on the Journal of Conflict Resolution website linked above. I set off to exactly replicate the model presented in Table 2, Model 4 for those following along in the paper.

The Replication

The first step to replicating the paper was to run their Stata do-file on the data they provided. This is done to ensure that I have exactly the same variables in my models that they do in theirs, specifically the residualized variables they present. As a note, I will avoid commenting on the general modeling choices made in the paper; this post is only about the predictive accuracy of models on out-of-sample data. I work with what the authors worked with. After running the do-file, I check to make sure that the results generated by Stata are the same as those presented in the paper, which is roughly true in the case of the coefficient estimates. I then take the data, including the newly generated variables, and read it into Python using statsmodels. After the data is read in, I replicate the formula used in the paper, generate a train-test split of 75% and 25%, and then fit the negative binomial model. The random 75%-25% split is appropriate here since the data is cross sectional and not time series in nature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
from sklearn import cross_validation
from sklearn.ensemble import RandomForestRegressor
import statsmodels.api as sm
import numpy as np
import patsy

data = sm.iolib.genfromdta('rep_data.dta', pandas = True)
#Make an R-style formula
form = 'arqade ~ gdppc + logpop + odapc + secenroll + oecd + cad + ssh + ethnic + law + rodapc + rsecenroll'
y, X = patsy.dmatrices(form, data, return_type = 'dataframe')

#Generate 75%-25% train-test split
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.25, random_state=0)

#Fit the model
mod = sm.GLM(y_train, X_train, family = sm.families.NegativeBinomial()).fit()

print mod.summary()

Generalized Linear Model Regression Results
==============================================================================
Dep. Variable:                      y   No. Observations:                   99
Model:                            GLM   Df Residuals:                       87
Model Family:        NegativeBinomial   Df Model:                           11
Link Function:                    log   Scale:                   1.54679011354
Method:                          IRLS   Log-Likelihood:                -190.88
Date:                Sun, 21 Apr 2013   Deviance:                       132.43
Time:                        16:32:56   Pearson chi2:                     135.
No. Iterations:                    17
==============================================================================
coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
const          7.5576      4.717      1.602      0.113        -1.687    16.802
x1          5.861e-05    4.2e-05      1.397      0.166     -2.36e-05     0.000
x2             0.1561      0.203      0.768      0.444        -0.242     0.554
x3            -0.0339      0.014     -2.380      0.019        -0.062    -0.006
x4            -0.0633      0.019     -3.315      0.001        -0.101    -0.026
x5             0.9221      0.878      1.050      0.297        -0.800     2.644
x6             5.5344      2.225      2.487      0.015         1.173     9.896
x7            -3.3479      0.888     -3.771      0.000        -5.088    -1.608
x8             0.7016      0.200      3.510      0.001         0.310     1.093
x9            -0.2140      0.261     -0.820      0.415        -0.726     0.298
x10            0.0441      0.018      2.477      0.015         0.009     0.079
x11            0.0724      0.022      3.304      0.001         0.029     0.115
==============================================================================

The results of this analysis roughly match the results presented in the paper. Some differences are to be expected, of course, due to the reduced number of observations. Since these results seem to match those in the paper, the out-of-sample forecasts can be run. This is done using the predict method of the mod object. The true and predicted results are then compared using the root mean squared error. I’m not completely confident in the use of the RMSE, but it is a quick and easy way to perform a comparison in this case.

1
2
3
4
5
6
7
y_pred = mod.predict(X_test)

#Calculate RMSE
rmse = np.sqrt(np.mean(np.square(y_test - y_pred)))

print 'The RMSE is {}'.format(rmse)
The RMSE is 22.7156218086

The value of the RMSE error shows that the predicted values tend to be off by about 23 events. This indicates that the model does not predict the impact of foreign aid on terrorist events well. Admittedly, this analysis may not be the most favorable to Azam and Thelen. In order to give them the benefit of the doubt, I also used another modeling technique to see if this inaccuracy persists. I used a random forest to fit the same model used in the negative binomial and then examined the RMSE.

1
2
3
4
5
6
7
8
9
10
11
12
13
#Create the RF regressor object
reg = RandomForestRegressor(1000, n_jobs=-1)
#Fit the model
reg.fit(X_train, y_train)

#Predict using the X_test data
rf_pred = reg.predict(X_test)

#Calculate RMSE
rmse = np.sqrt(np.mean(np.square(y_test - rf_pred)))

print 'The RMSE is {}'.format(rmse)
The RMSE is 21.7068829972

Even using a method that is possibly more robust, the prediction still doesn’t seem very satisfactory.

Models in the Social Sciences

In case it was not explicitly clear in the beginning of this post, I want to make it clear now: this analysis is not a direct critique of Azam and Thelen. I think their theory is sound and it makes sense that foreign investment and the other variables they analyze can make a difference in terrorism. I also do not think that the type of inferential statistics used by Azam and Thelen, and commonly used elsewhere in political science, are wrong in all cases. I do believe we can gain knowledge from models like those presented by Azam and Thelen. What I do take issue with, however, is the call to arms that is frequently sounded in the conclusions of papers, which are often directed towards policy makers, that the results of the paper predict phenomenon X and thus policy Y should be enacted. Especially when policy Y is something like military intervention. In order to make these types of claims, some type of out-of-sample test should be performed in order to determine the effect of the variables under examination.