上QQ阅读APP看书，第一时间看更新

Time series data

The example of cross-sectional data discussed earlier is from the year 2010 only. However, instead if we consider only one country, for example United States, and take a look at its military expenses and central government debt for a span of 10 years from 2001 to 2010, that would get two time series - one about the US federal military expenditure and the other about debt of US federal government. Therefore, in essence, a time series is made up of quantitative observations on one or more measurable characteristics of an individual entity and taken at multiple points in time. In this case, the data represents yearly military expenditure and government debt for the United States. Time series data is typically characterized by several interesting internal structures such as trend, seasonality, stationarity, autocorrelation, and so on. These will be conceptually discussed in the coming sections in this chapter.

The internal structures of time series data require special formulation and techniques for its analysis. These techniques will be covered in the following chapters with case studies and implementation of working code in Python.

The following figure plots the couple of time series we have been talking about:

Figure 1.3: Examples of time series data

In order to generate the preceding plots we will extend the code that was developed to get the graphs for the cross-sectional data. We will start by creating two new Series to represent the time series of military expenses and central government debt of the United States from 1960 to 2010:

central_govt_debt_us = central_govt_debt.ix[central_govt_debt['Country Code']=='USA', :].T 
military_exp_us = military_exp.ix[military_exp['Country Code']=='USA', :].T

The two Series objects created in the preceding code are merged to form a single DataFrame and sliced to hold data for the years 2001 through 2010:

data_us = pd.concat((military_exp_us, central_govt_debt_us), axis=1) 
index0 = np.where(data_us.index=='1960')[0][0] 
index1 = np.where(data_us.index=='2010')[0][0] 
data_us = data_us.iloc[index0:index1+1,:] 
data_us.columns = ['Federal Military Expenditure', 'Debt of Federal  Government'] 
data_us.head(10)

The data prepared by the preceding code returns the following table:

The preceding table shows that data on federal military expenses and federal debt is not available from several years starting from 1960. Hence, we drop the rows with missing values from the Dataframe data_us before plotting the time series:

data_us.dropna(inplace=True)
print('Shape of data_us:', data_us.shape)

As seen in the output of the print function, the DataFrame has twenty three rows after dropping the missing values:

Shape of data_us: (23, 2)

After dropping rows with missing values, we display the first ten rows of data_us are displayed as follows:

Finally, the time series are generated by executing the following code:

# Two subplots, the axes array is 1-d
f, axarr = plt.subplots(2, sharex=True)
f.set_size_inches(5.5, 5.5)
axarr[0].set_title('Federal Military Expenditure during 1988-2010 (% of GDP)')
data_us['Federal Military Expenditure'].plot(linestyle='-', marker='*', color='b', ax=axarr[0])
axarr[1].set_title('Debt of Federal Government during 1988-2010 (% of GDP)')
data_us['Debt of Federal Government'].plot(linestyle='-', marker='*', color='r', ax=axarr[1])