m0000002.htm

2.1 Model
Consider estimation of models where the dependent variable

undergoes a linear transformation

as in ( 1 ).

(1)

The vector

contains the

observations on the dependent variable,

represents the

matrix of observations on the independent variables,

is a positive definite

matrix, and the

-element vector

is distributed

. The log-likelihood for the MESS model in ( 1 ) is,

(2)

where

represents a scalar constant and both

and

are idempotent matrices. The term

is the Jacobian of the transformation from

. Without the Jacobian term,

containing all zeros would lead to a perfect, albeit pathological, fit. The Jacobian term penalizes attempts to use singular or near singular transformations to artificially increase the regression fit.

We explore the use of the matrix exponential as defined by ( 3 ) in modeling

(3)

where

represents an

non-negative matrix with zeros on the diagonal and

represents a scalar real parameter. While a number of ways exist to specify

, a common specification sets

for observations

sufficiently close (as measured by some metric) to observation

. By construction,

to preclude an observation from directly predicting itself. If

for the nearest neighbors of observation

contains neighbors to these nearest neighbors for observation

. Similar relations hold for higher powers of

which identify higher-order neighbors. Thus the matrix exponential

, associated with matrix

, can be interpreted as assigning rapidly declining weights for observations involving higher-order neighboring relationships. That is, observations reflecting higher-order neighbors (neighbors of neighbors) receive less weight than lower-order neighbors.

is row-stochastic,

will be proportional to a row-stochastic matrix, since products of row-stochastic matrices are row-stochastic (i.e., by definition

and therefore

(

, and so on, where

denotes a vector of ones). The same holds true for any power of

, since the powers are simply linear combinations of the powers of

, all of which are proportional to a row-stochastic matrix. Row-stochastic spatial weight matrices, or multidimensional linear filters, have a long history of application in spatial statistics (e.g., Ord (1975)). The row-stochastic weight matrix has very favorable numeric as well as statistical properties. For example, the product of a row-stochastic weight matrix

and a random variable vector

produces a vector of spatially local averages,

Chiu, Leonard, and Tsui (1996) proposed the use of the matrix exponential and discussed several of its salient properties, some of which are enumerated below:

1.	is positive definite,
2.	any positive definite matrix is the matrix exponential of some matrix,
3.	,
4.	.

The last property greatly simplifies the MESS log-likelihood. Since trace

and by extension

, the log-likelihood takes the form:

. Therefore, maximizing the log-likelihood is equivalent to minimizing

, the overall sum-of-squared errors. Thus, one can interpret the search for an optimal

as a search for a coordinate system (possibly oblique) which has the same multidimensional volume as the orthogonal Cartesian coordinate system, but yields a better goodness-of-fit among the variables (smaller

The MESS model in ( 1 ) is a bit more general than it appears. Let

represent a matrix of observations on

non-constant independent variables and let

be an integer large enough so that

approximately spans

, but small enough so that

cannot span

. The design matrix

(assuming full rank) could have the form ( 4 ).

(4)

In this case,

approximately spans

and thus the MESS model based on ( 4 ) nests a spatial autoregression in the errors. Hence, a set of linear restrictions on the parameters associated with the columns of

could yield the error autoregression. The MESS model with

specified as in ( 4 ) results in an estimate for

that does not depend upon the variance of the errors, only the direction of these errors (Pace and Barry (1998)). This allows the MESS model to effectively accommodate different structures for the spatial lags of

and

(Anselin (1988), p. 225-230). Hendry et al. (1984) advocate estimation of this type of general distributed lag model with subsequent imposition of restrictions that has been labeled the general to specific approach to model specification.

Table 1: A Comparison of estimates from SAR and MESS models

Variables	SAR Model	MESS Model
ln(Land Area)	-0.0152	-0.0188
Deviance	1142.64	1394.59
ln(Population)	0.0189	0.0233
Deviance	132.87	177.70
ln(Per Capita Income)	0.4613	0.4786
Deviance	31229.72	23987.24
ln(Age)	-0.0685	-0.0689
Deviance	1213.15	1097.93
Intercept	-1.6861	-1.3492
Deviance	3464.33	1945.65
Time (in seconds) Matlab	2414.92	3.36
Time (in seconds) FORTRAN 90	—	0.536
	57,647	57,647
	5	5