The below code is to perform a nonparametric two-sample t-test using bootstrapping. First I present the code, and will then follow up with a line-by-line description of what’s going on.
1-4: Just some imports. We need the floating point division from the future module, numpy, pandas, and the random module.
6-8: Defining a function
samplethat samples with replacement from the dataset, creating a new dataset of the same length as the original data. This makes use of therandom.choicefunction, which samples one item randomly from the data. This function is called the same number of times as there are observations in the data.10-17: Defining a function to perform the t-test, with two data inputs, the number of repititions to be performed, and the direction of the alternative hypothesis. First a 2 x n matrix is defined, with row 1 being all ones and row 2 being the data. This is then flipped to create an n x 2 matrix. The same procedure is then repeated for the control data, except with 0s instead of 1s. These two matrices are then stacked on top of each other.
tstatis the difference between the two groups, and tboot is a vector of zeros with length equal to the number of repititions for the bootstrap.18-21: This for-loop actually performs the bootstrap for the number of times indicated by
nboot. First, a sample of the data (Z) is taken using thesamplefunction defined above. This is then transformed into a pandas DataFrame, and given appropriate column names. Finally, the difference in means of the two groups is taken for each iteration of the loop and stored in the appropriate location intboot.22-28: This is simply calculating a proportion of samples that were greater or less than the test statistic, based on the direction of the alternative hypothesis. The final line (29) then prints the p-value as a float.