2.2 Computational considerations and comparisons

If the magnitude of the elements of the powers of do not rise with the power, the power series converges rapidly. Note, row-stochastic, non-negative can have a maximum of 1 in any row. Hence, the magnitude of the elements in the powers of does not grow with the power. Given the rapid decline in the coefficients in the power series, achieving a satisfactory progression with six or seven terms seems feasible.
Straightforward implementation of the definition of by a power series expansion truncated after terms with as the highest degree term would require operations for dense , as matrix multiplication requires operations for dense matrices. This would not be practical and computing the matrix exponential using eigenvalues and eigenvectors would also be impractical for sufficiently large dense matrices, since this requires operations.
If the graph of is strongly connected, meaning that a path exists between every pair of observations, then will be dense (all non-zeros) for positive (Horn and Johnson, p. 361-362). Hence, will be dense. Computing separately would thus require prohibitive amounts of memory for large . Fortunately, one does not need to compute separately, as always appears in conjunction with .
This allows computation of in operations for dense by sequential left-multiplication of by to form element vectors, (i.e., , , and so on).
For sparse , that would result from a spatial weight matrix based on nearest neighbors computed using Delaunay triangles, or a sparse covariance structure based on the spherical variogram, the number of operations required to compute declines dramatically. The number of operations required drops to , where denotes the number of non-zeros. For the nearest neighbor spatial weight matrix approach with non-zero entries in each row, the operation count associated with computing would decline to . This results in an operation count for computing in nearest neighbor specifications of that is linear in .
To illustrate this computation approach in detail, we define the x matrix comprised of powers of times in ( 5 ).
Note that, this simple form of sparse matrix-vector multiplication can be implemented without explicit use of sparse matrix multiplication algorithms. Given a list of neighbors for observation (i.e., and given equal weights for the neighbors, the product for observation is merely ()/. That is, the sparse matrix-vector products needed to compute in ( 5 ) require only indexing and addition, two of the fastest operations possible on digital computers. This means that the MESS model can be implemented in a variety of computing languages such as FORTRAN or C, and various statistical software environments. In fact, to demonstrate the feasibility of estimating MESS using standard software we coded the estimator in Fortran 90 as well as in Matlab. Employing a compiled language (Fortran 90) plus using indexing as opposed to sparse matrices increased the speed by more than 6 times.
We provide a comparison of maximum likelihood estimates and the time required to solve the MESS model and a traditional spatial autoregressive model (SAR) taking the form: , with a spatial weight matrix based on 30 nearest neighbors making it an asymmetric row-stochastic matrix. These comparative results were based on a dependent variable representing housing prices across 57,647 census tracts along with five explanatory variables described in section 3 . This comparison makes the point that the parameter estimates for in traditional spatial econometric models as well as inferences regarding the magnitude and significance of these parameters can be replicated using the MESS model. In this application the MESS model took the form: , where the explanatory variables matrix was not transformed for comparability with the SAR model.
The estimates from both models are presented in Table 1 along with computed deviances that would be used to draw inferences about the statistical significance of each variable. These deviances reflect the change in the log likelihood arising from sequential deletion of each variable from the model.
>From the table we see that the parameter magnitudes are very similar from both models. Since the model is in log form, the coefficients can be interpreted as the percentage change in housing prices that would result from a one percent change in the explanatory variables. The deviances would produce the same inferences regarding the significance or lack thereof for all variables. (In this case, all of the explanatory variables are significant at the 0.01 level.)

The table also shows a timing comparison for computing estimates using the two methods, where we see a dramatic improvement in speed associated with the MESS model which is over 1,000 times faster than the more traditional SAR model. This speed gain was obtained in Matlab. The Fortran coding accelerated this already fast implementation by over a factor of 6. In section 2.5 we motivate that the computational speed of the MESS model allows quick hypothesis testing of alternative specifications for the functional form taken by the spatial weight matrix, as well as more traditional specification tests used in regression models. These testing methods are illustrated in an application presented in section 3 .

Sidje (1998) has also used this as a point of departure in the computation of matrix exponentials. In addition, Sidje provides other algorithms for computing the matrix exponential right multiplied by a vector. Finally, Sidje provides software implementing these algorithms in both Matlab and Fortran 77.
Myers (1997, p. 276) indicates the spherical model is the one most often used in geostatistical practice. See Barry and Pace (1997) for more about the use of sparsity with the spherical model.
The use of an asymmetric 30 neighbor spatial weight matrix poses a substantial computational challenge to computing the log-determinant term used in maximum likelihood. The times reported in Table 1 could be improved to around 400 seconds by using symmetric spatial weight matrix. The times would improve further if fewer neighbors were used or if the Monte Carlo Log-determinant estimator proposed by Barry and Pace (1999) were employed to compute the SAR estimates.