2.1 Model

Consider estimation of models where the dependent variable undergoes a linear transformation as in ( 1 ).
 (1)
The vector contains the observations on the dependent variable, represents the x matrix of observations on the independent variables, is a positive definite x matrix, and the -element vector is distributed . The log-likelihood for the MESS model in ( 1 ) is,
 (2)
where represents a scalar constant and both and are idempotent matrices. The term is the Jacobian of the transformation from to . Without the Jacobian term, containing all zeros would lead to a perfect, albeit pathological, fit. The Jacobian term penalizes attempts to use singular or near singular transformations to artificially increase the regression fit.
We explore the use of the matrix exponential as defined by ( 3 ) in modeling ,
 (3)
where represents an x non-negative matrix with zeros on the diagonal and represents a scalar real parameter. While a number of ways exist to specify , a common specification sets for observations sufficiently close (as measured by some metric) to observation . By construction, to preclude an observation from directly predicting itself. If for the nearest neighbors of observation , contains neighbors to these nearest neighbors for observation . Similar relations hold for higher powers of which identify higher-order neighbors. Thus the matrix exponential , associated with matrix , can be interpreted as assigning rapidly declining weights for observations involving higher-order neighboring relationships. That is, observations reflecting higher-order neighbors (neighbors of neighbors) receive less weight than lower-order neighbors.
If is row-stochastic, will be proportional to a row-stochastic matrix, since products of row-stochastic matrices are row-stochastic (i.e., by definition and therefore (, and so on, where denotes a vector of ones). The same holds true for any power of , since the powers are simply linear combinations of the powers of , all of which are proportional to a row-stochastic matrix. Row-stochastic spatial weight matrices, or multidimensional linear filters, have a long history of application in spatial statistics (e.g., Ord (1975)). The row-stochastic weight matrix has very favorable numeric as well as statistical properties. For example, the product of a row-stochastic weight matrix and a random variable vector produces a vector of spatially local averages, .
Chiu, Leonard, and Tsui (1996) proposed the use of the matrix exponential and discussed several of its salient properties, some of which are enumerated below:
 1 is positive definite, 2 any positive definite matrix is the matrix exponential of some matrix, 3 , 4 .
The last property greatly simplifies the MESS log-likelihood. Since trace and by extension , the log-likelihood takes the form: . Therefore, maximizing the log-likelihood is equivalent to minimizing , the overall sum-of-squared errors. Thus, one can interpret the search for an optimal as a search for a coordinate system (possibly oblique) which has the same multidimensional volume as the orthogonal Cartesian coordinate system, but yields a better goodness-of-fit among the variables (smaller ).
The MESS model in ( 1 ) is a bit more general than it appears. Let represent a matrix of observations on non-constant independent variables and let be an integer large enough so that approximately spans , but small enough so that cannot span . The design matrix (assuming full rank) could have the form ( 4 ).
 (4)
In this case, approximately spans and thus the MESS model based on ( 4 ) nests a spatial autoregression in the errors. Hence, a set of linear restrictions on the parameters associated with the columns of could yield the error autoregression. The MESS model with specified as in ( 4 ) results in an estimate for that does not depend upon the variance of the errors, only the direction of these errors (Pace and Barry (1998)). This allows the MESS model to effectively accommodate different structures for the spatial lags of and (Anselin (1988), p. 225-230). Hendry et al. (1984) advocate estimation of this type of general distributed lag model with subsequent imposition of restrictions that has been labeled the general to specific approach to model specification.

Table 1: A Comparison of estimates from SAR and MESS models
 Variables SAR Model MESS Model ln(Land Area) -0.0152 -0.0188 Deviance 1142.64 1394.59 ln(Population) 0.0189 0.0233 Deviance 132.87 177.70 ln(Per Capita Income) 0.4613 0.4786 Deviance 31229.72 23987.24 ln(Age) -0.0685 -0.0689 Deviance 1213.15 1097.93 Intercept -1.6861 -1.3492 Deviance 3464.33 1945.65 Time (in seconds) Matlab 2414.92 3.36 Time (in seconds) FORTRAN 90 — 0.536 57,647 57,647 5 5