Geocomputation techniques for spatial analysis: are … · ing description is largely based on Turton (1998). GAM is basically a cluster finder for point or small-area data. Its purpose

1059

Cad. Saúde Pública, Rio de Janeiro, 17(5):1059-1081, set-out, 2001

DEBATE DEBATE

Geocomputation techniques for spatial analysis:are they relevant to health data?

Técnicas de geocomputação para análise espacial:é o caso para dados de saúde?

1 Divisão de Processamentode Imagens, Instituto Nacional de Pesquisas Espaciais. C. P. 515,São José dos Campos, SP12201-027, Brasil.

Gilberto Câmara 1

Antônio Miguel Vieira Monteiro 1

Abstract Geocomputation is an emerging field of research that advocates the use of computa-tionally intensive techniques such as neural networks, heuristic search, and cellular automatafor spatial data analysis. Since increasing amounts of health-related data are collected within ageographical frame of reference, geocomputational methods show increasing potential for healthdata analysis. This paper presents a brief survey of the geocomputational field, including sometypical applications and references for further reading.Key words Spatial Analysis; Geographical Methods; Computational Technics

Resumo A geocomputação é um campo de pesquisa emergente que propõe o uso de técnicas in-tensivas em computação, tais como redes neurais, busca heurística e autômatos celulares paraanálise de dados espaciais. Com o aumento do volume de dados de saúde coletados dentro deum referencial geográfico, os métodos geocomputacionais demonstram um potencial crescentepara a análise desses mesmos dados. Os autores apresentam uma revisão breve do campo dageocomputação, apresentando algumas aplicações típicas e sugestões bibliográficas.Palavras-chave Análise Espacial; Métodos Geográficos; Técnicas Computacional

Introduction

In recent years, the use of computer-basedtechniques for spatial data analysis has growninto an important scientific field, combiningtechniques from geographic information sys-tems and emerging areas such as neurocom-puting, heuristic search, and cellular automa-ta. In order to distinguish this new interdisci-plinary area from the simple extension of sta-tistical techniques to spatial data, Oppenshaw& Abrahart (1996) coined the term “geocompu-tation” to describe the use of computer-inten-sive methods for knowledge discovery in phys-ical and human geography, especially those in-volving non-conventional data clustering andanalysis techniques. More recently the termhas been applied in a broader sense to includespatial data analysis, dynamic modeling, visu-alization, and space-time dynamics (Longley,1998).

This paper is a brief survey of geocomputa-tional techniques. The review should not beconsidered exhaustive, rather attempting toprovide an overview of the concepts and moti-vation behind the term “geocomputation”. Ourprime motivation is to draw the attention ofthe public health community to the new ana-lytical possibilities offered by geocomputation-al techniques. We hope this discussion willserve to broaden their perceptions of new pos-sibilities in the spatial analysis of health data.

Motivations for research in geocomputation

Simply defined, geocomputation “is the processof applying computing technology to geograph-ical problems”. According to Oppenshaw &Abrahart (1996:665), “Many end-users merelywant answers to fairly abstract questions suchas ‘Are there any patterns, where are they, andwhat do they look like?’”. Although this defini-tion is generic, it points to a number of moti-vating factors, like the emergence of comput-erized data-rich environments, affordable com-putational power, and spatial data analysis andmining techniques.

The first motivation (data-rich environ-ments) has come about through the massivecollection of socioeconomic, environmental,and health-related data, increasingly organizedin computerized databases with geographi-cal references such as census tracts or postalcodes. Even in Brazil, a country with a limitedtradition of public availability of geographicaldata, the 2000 Census is being described as the

CÂMARA, G. & MONTEIRO, A. M. V.1060


first such initiative where all data collectionwill be automated and georeferenced.

The second motivation (computationalpower) has materialized in two forms: theemergence of the Geographic Information Sys-tems (GIS) technology and of a set of algorith-mically-driven techniques such as neurocom-putation, fuzzy logic, and cellular automata.

The third motivation (data analysis andmining techniques) has been heavily driven bythe application of data analysis techniques tospatial statistics, a research topic of consider-able importance in recent decades.

The broad nature of challenges and ap-proaches to geocomputational research is per-haps best illustrated by four different yet com-plementary approaches: computer-intensivepattern search, exploratory spatial data analy-sis, artificial intelligence techniques, and dy-namic modeling, as described in the followingsections.

Focus 1 – Computer-intensive pattern search

GAM – The Geographical Analysis Machine

One of the most typical examples of the com-puter-intensive approach to geocomputationis the Geographical Analysis Machine (GAM)developed by Stan Oppenshaw and co-workersat the Centre for Computational Geographicsat the University of Leeds. For a recent surveyof the GAM, see Oppenshaw (1998). The follow-ing description is largely based on Turton(1998).

GAM is basically a cluster finder for pointor small-area data. Its purpose is to indicate ev-idence of localized geographical clustering incases where statistical distribution of the phe-nomenon is not known in advance. For theGAM algorithm, a cluster is like a localized ex-cess incidence rate that is unusual in that thereis more of some variable than might be expect-ed. Examples would include: a local excess dis-ease rate, a crime hot spot, an unemploymentblack spot, unusually high positive residualsfrom a model, the distribution of a plant, surg-ing glaciers, earthquake epicenters, patterns offraud, etc (Figure 1).

The basic idea of the GAM is very simple.Within the study region containing a spatialpoint pattern, GAM works by examining a largenumber of circles of varying sizes that com-pletely cover the region of interest. The circlesoverlap to a large degree to allow for edge ef-fects and to provide a degree of sensitivity

GEOCOMPUTATIONAL TECHNIQUES FOR SPATIAL ANALYSIS 1061


analysis. Within each random circle, one countsthe number of points and compares this ob-served value with an expected value based onan assumption about the process generatingthe point pattern (usually that it is random).Ideally, the population at risk should be usedas the basis for generating the expected value,such as using a Poisson probability model withthe observed mean and the population at riskwithin each circle. Once the statistical signifi-cance of the observed count within a circle hasbeen examined, the circle is drawn on a map ofthe region if it contains a statistically signifi-cant cluster of points. The process is repeatedmany times until a map is produced contain-ing a set of circles centered on parts of the re-gion where interesting clusters of points ap-pear to be located.

A GAM application to infant mortality in Rio de Janeiro

Oppenshaw (1998) makes a strong case for per-formance of the GAM algorithm to locate clus-ters of diseases, including a comparison withother cluster-finding techniques. To better as-sess and understand the potentials and limita-tions of GAM, Teruiya et al. (1999) conductedan investigation using data from the study Spa-tial Analysis of Live-Born Profile and Socioeco-

nomic Conditions in Rio de Janeiro, by D’Orsi &Carvalho (1998). This study assessed the spa-tial birth and socioeconomic patterns in sub-divisions of the city of Rio de Janeiro, aiming toidentify the main groups of infant morbidityand mortality risks and the selection of primeareas for preventive programs.

In order to apply the GAM algorithm, thevalues had to be converted from areal-relatedpatterns to point variables. The authors select-ed some of the basic attributes used by D’Orsi& Carvalho and converted each area unit (cor-responding to a city district) to a point locationwhich received the value of the areal unit itrepresented, as illustrated by Figure 2.

The GAM algorithm was applied to the val-ues for the live-born quality index for all neigh-borhoods of Rio. GAM found three clusters ofhigh values for this index, located approxi-mately in the Botafogo, Barra da Tijuca, and Il-ha do Governador regions (Figure 3). The re-sults were concentrated in what is perceived bythe algorithm as “extreme” events of high val-ues for the index, disregarding cases which arenot “significant” enough. As a basis for com-parison, the traditional cloropleth-map isshown in Figure 4, where the areal-based val-ues are grouped by quintiles.

It should be noted that we have used theRio de Janeiro birth patterns merely as an ex-

Figure 1

Two steps in the GAM algorithm – left, initial step with smaller circles; right, later step with larger circles.

GAM = Geographical Analysis Machine.



Figure 2

Location of Rio de Janeiro urban sub-divisions.

Figure 3

Clusters of high APGAR index values found by GAM.

GAM = Geographical Analysis Machine; APGAR = APGAR scoring for newborns.

Source: d’Orsi & Carvalho, 1997.



ample to illustrate the computational behaviorof the GAM technique. It is important to notethat the algorithm was only searching for clus-ters of high values for the live-born quality in-dex. Clusters of low values are disregarded byGAM, since the algorithm was originally con-ceived to find clusters of high disease inci-dence. We hope to motivate health researchersto apply the GAM techniques to problems clos-er to its original intended use, such as sets ofepidemiological events.

Focus 2 – Exploratory spatial data analysis

Local spatial statistics

Statistical data analysis currently is the mostconsistent and established set of tools to ana-lyze spatial data sets. Nevertheless, the appli-cation of statistical techniques to spatial datafaces an important challenge, as expressed inTobler’s (1979) First Law of Geography: every-thing is related to everything else, but nearthings are more related than distant things. Thequantitative expression of this principle is theeffect of spatial dependence: the observed val-ues will be spatially clustered, and the sampleswill not be independent. This phenomenon, al-

so termed spatial autocorrelation, has longbeen recognized as an intrinsic feature of spa-tial data, and measures such as the Moran coef-ficient and the semi-variogram plot have beenused to assess the global association of the da-ta set (Bailey & Gattrel, 1995).

Most spatial data sets, especially those ob-tained from geo-demographic and health sur-veys, not only possess global spatial autocorre-lation, but also exhibit significant patterns ofspatial instability, which is related to regionaldifferentiations within the observational space.As stated by Anselin (1995), “the degree of non-stationarity in large spatial data sets is likely tobe such that several regimes of spatial associa-tion would be present”.

In order to assess the degree of spatial in-stability, various local spatial statistics indica-tors have been proposed, such as the MoranLocal Ii (Anselin, 1995), the Moran scatterplot(Anselin, 1996) and the Gi and Gi* statistics(Ord & Getis, 1995). For a recent review, seeGetis & Ord (1996). Although local spatial sta-tistics can be seen as a branch of spatial statis-tics, they have been highly praised by geocom-putational proponent Stan Oppenshaw: “it isabsolutely fundamental that we can developtools able to detect, by any feasible means, pat-terns and localized association that exist withinthe map” (Oppenshaw & Abrahart, 1996:665).

77,4 a 83,3

74,4 a 77,4

69,5 a 74,4

66,4 a 69,5

44,1 a 63,4

excluded

Figure 4

Grouping of APGAR index values.

Source: d’Orsi & Carvalho, 1997.APGAR = APGAR scoring for newborns.



Spatial statistics as a basis for zoning: social exclusion/inclusion in São Paulo

In order to assess the validity of local spatialstatistics, the authors have conducted a projectto study the potential of such indicators as a ba-sis for the design of administrative zoning sys-tems for the city of São Paulo. As is well known,zone design is a major challenge for urban andregional planners, since it involves major deci-sions on how to distribute public resources.

São Paulo is one of the world’s largest citiesand presents a major challenge to urban plan-ners and public administrators. Given its pre-sent size (over 13 million inhabitants) and enor-mous socioeconomic inequalities, rationalplanning of the city requires a careful divisionof the urban space into administrative regionsthat are homogenous by some objective crite-ria. Unfortunately, the current regional divisionof São Paulo has been driven by historical andpolitical forces and fails to reflect a rational at-tempt to challenge the city’s disparities.

As a basis for a zoning design for São Paulo,we have taken the “Social Exclusion/InclusionMap of São Paulo”, a comprehensive diagnosisof the city coordinated by Prof. Aldaiza Sposatiof the Social Research Group at the CatholicUniversity of São Paulo. This map used 49 vari-ables obtained from Census data and local or-ganizations to quantify the social apartheid in96 districts of São Paulo (Sposati, 1996).

The main results of the “social exclusion/inclusion map” were indicators of social exclu-sion and disparities in quality of life in SãoPaulo. Figure 5 shows the map of the social ex-clusion index (Iex), where the values vary from-1 (maximal social exclusion) to +1 (maximalsocial inclusion) with a value of 0 indicatingthe attainment of a basic standard of inclusion(Sposati, 1996). Note from the map that two-thirds of the districts in São Paulo have accept-able living standards.

Taking the social exclusion index as a basis,the proposed task was to group the 96 districtsinto a set of administrative zones, each con-taining a significant number of districts andhomogeneous with respect to social exclusionstatus. We used two exploratory spatial analy-sis tools: the Moran Scatterplot Map (Figure 6,left) and the local Moran index significancemap (Figure 6, right). The basis for these localspatial statistics indicators is the use of aneighborhood or contiguity matrix W whoseelements are wij = 0 if i and j are not neighborsand non-zero otherwise.

The Moran scatterplot map is a tool for vi-sualizing the relationship between the ob-

served values Z and the local mean values WZ,where Z indicates the array of attribute values(expressed as deviations from the mean) andWZ: is the array of local mean values, comput-ed using matrix W. The association between Zand WZ can be explored to indicate the differ-ent spatial regimes associated with the dataand to display graphically as indicated by Fig-ure 6 (left). The Moran Scatterplot Map dividesspatial variability into four quadrants: • Q1 (positive values, positive local means)and Q2 (negative values, negative local means):indicate areas of positive spatial association.• Q3 (positive values, negative local means)and Q4 (negative values, positive local means):indicate areas of negative spatial association.

Since the Iex variable exhibits global posi-tive spatial autocorrelation (Moran I = 0.65, sig-nificance = 99%), areas in quadrants Q3 and Q4are interpreted as regions that do not follow thesame global process of spatial dependence, andthese points indicate transitional regions be-tween two different spatial regimes.

The local Moran index Ii is computed bymultiplying the local normalized value zi bythe local mean (Anselin, 1995):

In order to establish a significance test forthe local Moran index, Anselin (1995) proposesa pseudo-distribution simulation by permuta-tion of the attribute values among the areas.Based on this pseudo-distribution, traditionalstatistical tests are used to indicate local indexvalues with significance of 95% (1.96σ), 99%(2.54σ), and 99.9% (3.20σ). The “significant” in-dexes are then mapped and posited as “hotspots” of local non-stationarity.

The local Moran index significance map in-dicated three “hot spots”, two of which relatedto low values of inclusion (located to the Southand East of the city) and one related to highvalues of inclusion (located in the Center of thecity). These patterns correspond to the extremeregions of poverty and wealth in the city andwere chosen as “seeds” in the zoning procedure.

The remaining regions were defined inter-actively, taking into account the Moran scatter-plot map, which clearly indicates a number oftransition regions between the regions of Q1and Q2 locations (to so-called “high-high” and“low-low” areas), some of which are indicat-ed by the ellipses. These regions were groupedinto separate zones. The work proceeded inter-actively until a final zoning proposal was pro-duced, which can be confronted with the cur-rent administrative regions (Figure 7).

In order to assess the resulting map, a re-gression analysis was performed. This regres-sion analyzes the correlation between the per-



centage of houses with proper sewage facilities(as independent variable) and the percentageof people over 70 years of age (as dependentvariable). The rationale behind this choice wasthat social deprivation is a serious impedimentto healthy living, as measured by the percent-age of elderly in the population. Three OLS (or-dinary least squares) regression analyses wereperformed: the first, taking all districts of thecity overall; the second, using the current ad-ministrative division as separate spatial regimes;and the third, using the proposed new zoningas spatial regimes. The results as summarizedin Table 1.

These results are a positive indication of thepossible use of local spatial statistics as a basisfor zoning procedures and show how indica-tors such as the social exclusion index of Sposati(1996) can be used as a support for urban plan-ning.

Focus 3 – Neural networks and geographic analysis

Introduction

An Artificial Neural Network (ANN) is a com-puter paradigm inspired by the way the brainprocesses information. The key element in thisparadigm is a processing system composed ofa large number of highly interconnected ele-ments (neurons) working in unison to solvespecific problems. An ANN is configured for aspecific application, such as pattern recogni-tion or data classification, through a learningprocess (Gopal, 1998).

In principle, ANNs can represent any com-putable function, i.e., they can do everything anormal digital computer can do. In practice,ANNs are especially useful for classificationand function approximation and mappingproblems which are tolerant of some impreci-sion and have plenty of training data available.Almost any mapping between vector spacescan be approximated to arbitrary precision byfeedforward ANNs (which are the type most of-ten used in practical applications) if there areenough data and enough computing resources.

Given the capabilities of ANNs as explorato-ry tools in data-rich environments, there hasbeen considerable interest in their use for spa-tial data analysis, especially in remote sensingimage classification (Kannelopoulos, 1997; Le-ondes, 1997). Other geographical applicationsinclude: spatial interaction modeling (Gopal &Fischer, 1996; Oppenshaw, 1993) and classifi-cation of census data (Winter & Hewitson, 1994).

Neural networks for spatial data integration: an economical-ecological zoning application

To illustrate the potential of ANN for spatial da-ta analysis, we have selected one example: theuse of neural networks for the integration ofmultiple spatial information for an environ-mental zoning application (Medeiros, 1999).Although the chosen application does not in-volve health data, the integration procedureshown is relevant to heath-assessment appli-cations, which involve multiple data sets aspossible sources of epidemiological risk.

One of the more important problems in ge-ographical data analysis is the integration ofseparate data sets to produce new spatial infor-

Social Exclusion

-1 to -0.69

-0.69 to -0.43

-0.43 to -0.27

-0.27 to -0.08

-0.08 to 0.26

0.26 to 1

Figure 5

Social Exclusion Index (Iex) in São Paulo (96 districts grouped in sextiles).

Local Moran Index

not significant

95% significant

99% significant



mation. For example, in health analysis, a re-searcher may be interested in assessing therisks associated with a disease (such as malaria)based on a combination of different conditions(land use and land cover, climatology, hydro-logical information, and distance to main roadsand cities). These conditions can be expressedas maps, integrated into a common geographi-cal database by means of GIS technology.

Once the data has been organized in a com-mon geographical reference, the researcherneeds to determine a procedure to combinethese data sets. Taking a hypothetical example,a health researcher may want to calculate a riskmap for malaria based on known disease inci-dence, climate, distance to cities, and land cov-

er, where the conditions are such that a regionis deemed “high risk for malaria” if it rainsmore that 1000 m/year and the land cover is“inundated forest” and is located less than 50km from a city.

The main problem with these map infer-ence procedures is their ad hoc, arbitrary na-ture: the researcher formulates hypothesesfrom previous knowledge and applies them tothe data set. The process relies on inductiveknowledge of the reality. Additionally, when theinput maps have many different conditions,the definition of combinatory rules for deriv-ing the output may be difficult. For example, ifan input map has eight different conditions(e.g., land cover classes) and five maps are to

Moran Scatterplot

High-High

Low-Low

High-Low

Low-High

Figure 6

Left: Moran Scatterplot Map for Iex in São Paulo. Right: Significant values of Local Moran Index (gray = 95%; dark gray = 99%).



be combined, then 85 (= 32,768) different situa-tions have to be taken into account.

There are two main alternative approachesto this problem. One is to use fuzzy logic tocombine the maps (Câmara et al., 2000). In thiscase, all input data are transformed into fuzzysets (in a [0,1] scale) and a fuzzy inference pro-cedure may be used. Alternatively, the use ofneural network techniques aims at capturingthe researcher’s experience, without the needfor the explicit definition of the inference pro-cedure. The application of neural networks tomap integration can be done using the follow-ing steps:• Create a georeferenced database with theinput (conditional maps)

• Select well-known regions as training areas.For these areas, indicate the desired output re-sponse (such as health risk).• Use these training areas as inputs to a neur-al network learning procedure. • Using the trained network, apply the infer-ence procedure for the entire study region.• Evaluate the result and redo the trainingprocedure, if necessary.

This idea was applied by Medeiros (1999) inhis study of the integration of natural resourcesdata as a basis for economical-ecological zon-ing in the Amazon region. Medeiros used fivedata sets as input: vegetation, geology, geomor-phology, soils, and remote sensing images.Medeiros (1999) compared the result obtained

Figure 7

Administrative Zones in São Paulo. Left – Current Division in 11 administrative regions. Right – Proposed division in 13 new regions.



by the neural network with a subjective opera-tor interpretation and found a very strong spa-tial coherence between the two maps, with theneural-produced one being more restrictive interms of results than the subjective one (Fig-ure 8).

Focus 4 – Cellular automata

The computer representation of geographicalspace in current GIS technology is essentiallystatic. Therefore, one important research focusin geocomputation aims to produce modelsthat combine the structural elements of space(geographical objects) to the processes thatmodify such space (human actions as they op-erate in time). Such models would free us fromstatic views of space (as centuries of map-mak-ing have conditioned us) and to emphasize thedynamic components as an essential part ofgeographical space.

This motivation has led to the use of cellu-lar automata as a technique for simulation ofurban and regional growth. Cellular automata(CA) are very simple dynamic spatial systemsin which the state of each cell in an array de-pends on the previous state of the cells withina neighborhood of the cell, according to a setof transition rules. CA are very efficient com-putationally because they are discrete, iterativesystems that involve interactions only withinlocal regions rather than between all pairs ofcells. The good spatial resolution that can thusbe attained is an important advantage whenmodeling land use dynamics, especially forplanning and policy applications (White & En-gelen, 1997).

A conventional cellular automaton consistsof: (a) a Euclidean space divided into an arrayof identical cells; (b) a cell neighborhood; (c) aset of discrete cell states; (d) a set of transitionrules which determine the state of a cell as afunction of the states of cells in the neighbor-hood; and (e) discrete time steps, with all cellstates updated simultaneously.

The application of CA to geographical sys-tems was first proposed by Tobler (1979). Morerecently, a number of researchers have pro-posed modifications of the original CA idea toaccommodate geographical constraints. Themost important characteristic to be discardedis the homogeneous cell space, replaced by aspace in which each cell has its own inherentset of attributes (as distinct from its singlestate) which represent its relevant physical, en-vironmental, social, economic, or institutionalcharacteristics. These advances have been ac-companied by an increase in the models’ com-plexity (Couclelis, 1997; White & Engelen, 1997).

This modification has allowed CA modelsto be linked both conceptually and practicallywith GIS. Since the CA is running on an inho-mogeneous cell space (essentially identical towhat would be found in a raster GIS), the CAmay be thought of as a sort of dynamic GIS(Batty & Xie, 1994). At present, however, CAmodels developed in GIS remain simple, be-cause GIS do not yet provide operators withsufficient flexibility to define complex CA tran-sition rules, and in addition they lack the simu-lation engines needed to run complex modelsat practical speeds. The more practical ap-proach is to couple GIS to special-purpose CAsoftware modules, and possibly other modelsas well. White et al. (1997) have developed sev-eral CA and CA-based integrated models de-signed as prototypes of Spatial Decision Sup-port Systems for urban and regional planningand impact analysis (demos of several of thesemodels can be downloaded from <http://www.riks.nl/RiksGeo/freestuff.htm>).

In conclusion: geocomputation as a set of effective procedures

This survey has examined some of the mainbranches of research in geocomputation, andwe conclude the paper with an attempt to pro-vide a unified perspective of this new researchfield.

We propose that a unifying perspective forgeocomputation is the emphasis on algorith-mic techniques. The rationale for this approachis that the emergence of data-rich spatial data-bases motivated a new set of techniques forspatial data analysis, most of them originallyproposed under the general term “artificial in-telligence”, such as neural networks, cellularautomata, and heuristic search.

Since there are fundamental differences inthe perspectives of the set of techniques usedby geocomputation, the only unification per-

Table 1

Correlation coefficients for (old age, sewage) regression in São Paulo.

Situation Number of spatial regimes R2 (correlation coefficient)

All city districts 1 0.35

Current Zoning 11 0.72

Proposed Zoning 13 0.83



Figure 8

Neural network based inference.

ANN Modeling Modeling via LEGAL Map Algebra User Inclusion Background Stable Moderate- Mean- Moderate- Vulnerable Sum. Xi Xi/Sum. Xij exactness error

stable stable/ vulnerable (%)vulnerable

Background 0 0 44 8 16 142 210 0.47 0.00 100.00

Stable 0 15 0 0 0 6 21 0.5 71.43 28.57

Moderate-stable 1 0 19861 434 782 159 21237 47.26 93.52 6.448

Mean-stable/Vulnerable 0 0 473 2598 17 6 3094 6.88 83.97 16.03

Moderate-vulnerable 0 0 671 26 10166 106 10969 24.41 92.68 7.32

Vulnerable 0 2 219 13 122 9053 9409 20.94 96.22 3.78

Sum. Xj 1 17 21268 3079 11103 9472 44940 100.00

Xj/Sum. Xij 0.00 0.04 47.33 6.85 24.71 21.08 100.00 41693.00

Producer exactness (%) 0.00 88.24 93.36 84.36 91.56 95.58

Omission error (%) 100.00 11.76 6.62 15.62 8.44 4.42

K % = 91.89 Overall Exactness (%) = 92.8

ANN = Artificial Neural Network; Sum. = summation; LEGAL = Map Algebra Language.

LEGAL Map Algebra ANN

stable

moderate-stable

mean-stable/vulnerable

moderate-vulnerable

vulnerable



spective is the computational side: such tech-niques can be thought of as a set of effectiveprocedures that, when applied to geographicalproblems, are bound to produce results. What-ever results are obtained need to be interpret-ed in light of the basic assumptions of thesetechniques, and it may be extremely difficult toassign any traditional “statistical significance”criteria to them.

Therefore, the authors propose a tentativedefinition: “Geocomputation is the use of a setof effective computing procedures to performspatial data analysis, whose results are depen-dent on the basic assumptions of each tech-nique and therefore are not strictly comparable”.

According to this view, geocomputationemphasizes the fact that the structure and datadependency inherent in spatial data can beused as part of the knowledge-discovery ap-proaches, and the choices involve theory aswell as data. This view does not deny the im-

portance of the model-based approaches, suchas the Bayesian techniques based on MonteCarlo simulation for the derivation of distribu-tion parameters on spatial data. In fact, in thisbroader perspective, the use of Bayesian tech-niques that rely on computationally intensesimulations can be considered a legitimatepart of the geocomputational field of research.

In conclusion, what can public health re-searchers expect from geocomputation? Whenused with discretion, and always bearing inmind the conceptual basis of each approach,techniques such as GAM, local spatial statis-tics, neural nets, and cellular automata can bepowerful aids to a spatial data analysis re-searcher, attempting to discover patterns inspace and relations between its components.

We hope this article serves as inspiration tohealth researchers and that it will have broad-ened their notions about what is possible inspatial data analysis.

Acknowledgments

The authors wish to thank Marilia Carvalho and Os-waldo Cruz at Fundação Oswaldo Cruz for the invita-tion and challenge to apply GIS techniques to healthdata. We also acknowledge the important contribu-tion of our fellow colleagues and students at InstitutoNacional de Pesquisas Espaciais (INPE): Eduardo Ca-margo, José Simeão de Medeiros, Marcos Corrêa Ne-ves, Roseli Teruiya, Virginia Correa, and to the entireSPRING software team at INPE.

Further reading

For readers interested in more information on geo-computation, we provide a set of references, orga-nized by topics. We suggest that prospective readersbegin with Longley (1998) and then proceed to theirspecific area of interest.

References

ABRAHART, B.; BENENSON, I.; BENWELL, G.; DIB-BLE, C.; CALDWELL, D.; GAHEGAN. M.; HART-NETT, J.; LEES, B.; LONGLEY, P.; MACMILLAN, B.;PASCOE, R. & PULLER, D., 2000. GeocomputationConference Series Home Page. 17 April 2000 <http://www.ashville.demon.co.uk/geocomp/index.htm>.

ANSELIN, L., 1995. Local indicators of spatial associ-ation – LISA. Geographical Analysis, 27:91-115.

ANSELIN, L., 1996. The Moran scatterplot as ESDAtool to assess local instability in spatial associa-tion. In: Spatial Analytical Perspectives on GIS (M.Fisher, H. J. Scholten & D. Unwin, ed.), pp. 111-126. London: Taylor & Francis.

BAILEY, T. & GATTRELL, A., 1995. Spatial Data Analy-sis by Example. London: Longman.

BATTY, M. & XIE, Y., 1994. From cells to cities. Envi-ronment and Planning B, 21:531-548.

CÂMARA, G., 2000. Geoprocessamento: Teoria e Apli-cações. 11 April 2000 <http://www.dpi.inpe.br/gilberto/livro>.

CARVALHO, M. S., 1997. Aplicação de Métodos de Aná-lise Espacial na Caracterização de Áreas de Risco àSaúde. Tese de Doutorado, Rio de Janeiro: Progra-ma de Engenharia Biomédica, Instituto AlbertoLuiz Coimbra de Pós-graduação e Pesquisa de En-genharia, Universidade Federal do Rio de Janeiro.



COUCLELIS, H., 1997. From cellular automata to ur-ban models: New principles for model develop-ment and implementation. Environment andPlanning B: Planning & Design, 24:165-174.

D’ORSI, E. & CARVALHO, M. S., 1998. Perfil de nasci-mentos no Município do Rio de Janeiro – Umaanálise espacial. Cadernos de Saúde Pública, 14:367-379.

GETIS, A. & ORD, J. K., 1996. Local spatial statistics:An overview. In: Spatial Analysis: Modelling in aGIS Environment. (P. Longley & M. Batty, ed.), pp.261-277, New York: John Wiley.

GOPAL, S., 1998. Artificial neural networks for spatialdata analysis. In: NCGIA Core Curriculum in Geo-graphic Information Science (M. Goodchild & K.Kemp, ed.), Santa Barbara: National Center forGeographical Information and Analysis. 15 May2000 <http://www.ncgia.ucsb.edu/giscc/units/u188/u188.html>.

GOPAL, S. & FISCHER, M., 1996. Learning in singlehidden layer feedforward neural network models:Backpropagation in a spatial interaction model-ing context. Geographical Analysis, 28:38-55.

HEWITSON, B. C. & CRANE, R. G., 1994. Neural Nets:Applications in Geography. Dordrecht: KluwerAcademic Publishers.

KANELLOPOULOS, I., 1997. Use of Neural Networksfor Improving Satellite Image Processing Tech-niques for Land Cover/Land Use Classification.European Commission, Joint Research Centre. 10April 2000 <http://ams.egeo.sai.jrc.it/eurostat/Lot16-SUPCOM95/final-report.html>.

LEONDES, C., 1997. Image Processing and PatternRecognition. Neural Network Systems Techniquesand Applications Series, v. 5. New York: AcademicPress.

LONGLEY, P., 1998. Geocomputation: A Primer. NewYork: John Wiley and Sons.

MEDEIROS, J. S., 1999. Bancos de Dados Geográficos eRedes Neurais Artificiais: Tecnologias de Apoio àGestão do Território. Tese de Doutorado, São Paulo:Departamento de Geografia, Universidade de SãoPaulo.

OPPENSHAW, S., 1993. Modelling spatial interactionusing a neural net. In: GIS Spatial Modeling andPolicy (M. M. Fischer & P. Nijkamp, ed.), pp. 147-164. Berlin: Springer.

OPPENSHAW, S., 1997. Developing GIS-relevant zone-based spatial analysis methods. In: Spatial Analy-sis: Modelling in a GIS Environment (P. Longley &M. Batty, ed.), pp. 55-73, New York: John Wiley.

OPPENSHAW, S., 1998. Building automated Geo-graphical Analysis and Exploration Machines. In:Geocomputation: A Primer (P. A. Longley, S. M.Brooks & B. McDonnell, ed.), pp. 95-115, Chich-ester: MacMillan Wiley.

OPPENSHAW, S. & ABRAHART, R. J., 1996. Geocom-putation. In: Proceedings of the 1st InternationalConference on GeoComputation (R. J. Abrahart,ed.), pp. 665-666, Leeds: University of Leeds.

OPPENSHAW, S. & ABRAHART, R. J., 2000. Geocom-putation. London: Taylor and Francis.

OPPENSHAW, S. & OPPENSHAW, C. 1997. ArtificialIntelligence and Geography. New York: John Wiley.

ORD, J. K. & GETIS, A., 1995. Local spatial autocorre-lation statistics: Distributional issues and an ap-plication. Geographical Analysis, 27:286-306.

O’SULLIVAN, D., 1999. Exploring the structure ofspace: Towards geo-computational theory. In: IVInternational Conference on GeoComputation,Proceedings. 12 May 2000 <http://www.geovista.psu.edu/geocomp/geocomp99>.

SPOSATI, A., 1996. Mapa da Exclusão/Inclusão Socialda Cidade de São Paulo. São Paulo: EDUC.

TAKEYAMA, M. & COUCLELIS, H., 1997. Map dynam-ics: Integrating cellular automata and GIS throughGeo-Algebra. International Journal of Geographi-cal Information Science, 11:73-91.

TERUIYA, R.; CÂMARA, G. & MONTEIRO, A. M. V.,1999. An Application of GAM for Health Data inRio de Janeiro. Internal Report. São José dos Cam-pos: Instituto Nacional Pesquisas Espaciais.

TOBLER, W. R., 1979, Cellular geograph. In: Philoso-phy in Geography (S. Gale & G. Olsson, ed.), pp.379-386, Dordrecht: D. Reidel Publishing Compa-ny.

TURTON, I., 1998. The Geographical Analysis Ma-chine. Centre for Computational Geographics,University of Leeds. 29 March 2000 <http://www.ccg.leeds.ac.uk/smart/gam/gam.html>.

WHITE, R. & ENGELEN, G., 1993. Cellular dynamicsand GIS: Modelling spatial complexity. Geograph-ical Systems, 1:2.

WHITE, R. & ENGELEN, G., 1997. Cellular automataas the basis of integrated dynamic regional mod-elling. Environment and Planning B, 24:235-246.

WHITE, R.; ENGELEN, G. & ULJEE, I., 1997. The useof constrained cellular automata for high-resolu-tion modelling of urban land-use dynamics. En-vironment and Planning B, 24:323-343.

WINTER, K. E. & HEWITSON, B. C., 1994. Self orga-nizing maps – Application to census. In: NeuralNets: Applications in Geography (B. C. Hewitson &R. G. Crane, ed.), Dordrecht: Kluwer AcademicPublishers.



Trevor C. Bailey The authors are to be commended on this in-teresting and thought-provoking review of thenew analytical possibilities offered by “geo-computation” techniques, and I concur withthem that there is considerable potential foruseful applications of such techniques in thespatial analysis of health data. I agree whole-heartedly that the wider dissemination of suchmethods along with associated software toolsmay ultimately benefit many areas of geo-graphical health and environmental research.We do indeed face a “data-rich” future in thosefields of study and one where data will be notonly voluminous but also complex. I meanboth complex in content (e.g. in the topograph-ic and geographical detail provided by GIS andremote sensing) and also complex in structure(e.g. data from disparate sources relating to dif-ferent geographical scales and reference frame-works that need to be integrated in the study ofmany issues of interest in health research). In-deed one suspects that the future may alreadybe with us! Traditional spatial analysis methodsare not designed to handle such data complex-ity (e.g. many make little use of anything moresophisticated than simple Euclidean distanceor the contiguity of areal units in order to re-flect proximity, many assume some form of sta-tionarity in the processes modeled, and fewcan handle data sources at different levels ofspatial aggregation). We undoubtedly do neednew analysis methods that are capable of ex-ploiting more complex concepts. The authorsconvince me that geocomputational researchoffers some promising avenues for achievingthat, and this paper and the work referenced init therefore deserves serious and careful con-sideration by those involved in geographicalhealth research.

However, while generally enthusiastic aboutthe possibilities offered by the techniques dis-cussed and agreeing with much that is said inthe paper, there are some issues which I wouldlike to take up from the perspective of an ap-plied statistician with an interest in spatialanalysis, and I restrict my remaining remarksto those.

First, I do not consider that we need to thinkof geocomputational techniques as an alterna-

Debate on the paper by Gilberto Câmara & Antônio Miguel Vieira Monteiro

Debate sobre o artigo de Gilberto Câmara & Antônio Miguel Vieira Monteiro

School of Mathematical Sciences, University ofExeter, Exeter, England.

tive to more traditional statistical methods andmodels, but rather as a complement to them.Modern statistical analysis is itself a broadchurch and no stranger to computer-intensivemethods. To establish a dividing line betweenmany existing forms of descriptive or explorato-ry statistical analysis and geocomputation maybe useful in order to focus attention and pro-mote the use of novel forms of algorithmic ap-proach. However, from the point of view of apracticing statistician, such a distinction issomewhat artificial. Many existing forms of vi-sualization and projection techniques used instatistics, particularly those employed in theanalysis of high-dimensional data, have littleto do with traditional notions of statistical in-ference. Statisticians are quite comfortable andfamiliar with using essentially algorithmicmethods where appropriate and have been do-ing so for many years. What matters in explor-ing data is that the analyses conducted arecareful and thorough, not what type of algo-rithms are employed to achieve that. So I donot see geocomputation as competing with mycurrent statistical exploratory tool box, butrather as adding to it (in fact I consider two ofthe methods discussed in this paper, the GAMand local indicators of spatial association, toalready be a part of it, although I am happy tosee them re-branded as geocomputation if itencourages their use!).

However, I do stress that I see geocomputa-tional techniques as essentially exploratory,and that brings me to my second point. An-swers to the questions: Are there any patterns,where are they, and what do they look like? Areundoubtedly of value, but ultimately they are apreliminary to the more important ones of Whyare they there, will they happen again, and howwill they change if we intervene in a particularway? The answer to this second set of ques-tions requires a scientific explanation of thephenomenon under study, and given the in-trinsically stochastic nature of most social, en-vironmental, and health-related phenomena,the best tool for this will, I suspect, remain thestatistical model. I am not suggesting that suchmodels will be true; the very word model im-plies simplification and idealization and I ful-ly appreciate that that complex geographicalhealth and environmental systems cannot beexactly described by a few formulae. However,the construction of idealized representationsof important aspects of such systems consis-tent with the existing substantive epidemiolog-ical or public health knowledge should remainthe ultimate goal. I therefore see the primaryvalue of geocomputation as assisting in the sta-



tistical model-building process and not circum-venting it.

This view of the role of geocomputation(which I freely admit may be narrower thanthat held by the authors) leads me to my thirdpoint, which relates to various concerns overthe practical use of some kinds of geocompu-tational algorithms. The process of model-building is ideally both interactive and itera-tive. The analyst needs to try out ideas on thedata, and this requires exploratory tools thatcan be guided or steered towards particularchosen ends or hypotheses. At present, manygeocomputational algorithms appear too muchof a “black box” to make this possible. The verynature of the algorithms makes it difficult toprovide simple, readily understood control pa-rameters which enable them to be “steered” to-wards answering particular questions whichone might wish to ask of the data. In a sensethey provide an answer in the absence of aquestion. This detracts from their value as ex-ploratory tools for the model builder. In thatsense what is often termed “artificial intelli-gence” might be better referred to as “artificialun-intelligence”. There is also the problem ofwhether such techniques produce robust re-sults as opposed to ones which are pure arti-facts of the data. I appreciate that traditionalnotions of statistical significance and standarderror cannot and perhaps should not be lookedfor in relation to these algorithms and that dif-ferent algorithmic approaches will naturally re-veal different aspects of the data. However, thesensitivity of the results from any one of them(e.g. to starting conditions or in repeated ap-plication to various subsets of the data) needsto be investigated and is often not. If the dataare to be mined then we need to establishwhether a vein of gold has been found or a veinof fool’s gold, and currently the algorithms areweak on the diagnostics that would enable usto measure that.

In summary I do not wish to appear as adogged defender of existing spatial statisticalmodels and methods. I am well aware how de-ficient many of those are. For example, tradi-tional spatial models largely involve space interms of glib abstractions – “distances”, “bound-aries”, and “edge effects”. Of course in realitythe areas over which analyses are being con-ducted are vastly complex, criss-crossed withnatural boundaries such as forests, rivers, orranges of hills, or else human constructionssuch as roads, industrial estates, recreationalparks, and so on. Many commonly used spatialstatistical methods and models should beviewed in the cold light of their spatial simplic-

ity compared with what we know to exist in ge-ographical reality and upon which data are nowavailable through GIS and remote sensing. Hu-mility would indeed be wise for anyone de-fending such models, and it is useful to be re-minded of that and presented with some novelalgorithmic approaches in this paper whichmay assist to address it. Therefore I welcomenew and improved algorithms for exploratoryspatial analysis of health data capable of ex-ploiting the complexity of data and of geogra-phy. If geocomputation matures to offer that,then I am very comfortable with using it. How-ever, I think we should be cautious about exag-gerating its potential. Data analysis in generalinvolves more than methods; it depends oncontextual knowledge of the phenomenon un-der study, the objectives of the analysis, thequality and origins of the data, and the judg-ment and experience of the analyst. Because ofthat there is a long-standing resistance amongapplied statisticians to the suggestion that whatthey do is just another branch of mathematics.It would not be surprising to find them equallyresistant to the suggestion that it should be-come a branch of computer science. I alsodoubt that geographical health and environ-mental research would necessarily benefit ifthat were to become the case.

Department of Population Medicine,University of Guelph,Guelph, Canada.

David Waltner-Toews

Epidemiologists, after several decades of favor-ing non-spatial statistical models, are increas-ingly realizing the importance of understand-ing socioeconomic and ecological contexts inthe interpretation of disease patterns in popu-lations (McMichael, 1999). As the questions weare asking change in both scope and nature, in-put from scholars in non-health fields with ex-pertise in studying spatial patterns, such as thispaper, are a welcome addition to the health lit-erature.

The authors state that their intent is to“draw the attention of the public health com-munity to the new analytical possibilities of-fered by geocomputational techniques”. Whilethe introduction of these techniques to healthresearchers is laudable in and of itself, I wouldlike to throw out some cautionary notes, basedon some experience working with interdiscipli-nary teams where these techniques have beenproposed.



None of the motivations listed by the au-thors, apart from what they refer to as the “ab-stract” searching for patterns, are grounded inasking scientific or scholarly questions. Cer-tainly the identification of disease patterns isan important first step, but without carefullythinking through the nature of the disease andhow it is spread, combining maps of variousoutcomes and characteristics can be both mis-leading and also dangerous, to the extent thatit leads to misdirection of funds to attack dis-eases in particular ways.

As they themselves acknowledge, the majormotivating force behind the use of many ofthese techniques is simply “the emergence ofcomputerized data-rich environments” andthe availability of “affordable computationalpower.” My experience has been similar, and asa scientist I am very skeptical of such motiva-tion. It leads to researchers confusing theirunits of analysis, slipping between individuals,communities, and regions, or combining themin the same maps, and making false inferencesacross scales. The determinants of cases andthe determinants of incidence rates are oftenquite different (Rose, 1985). I shudder to thinkthat we are training young scholars who aredriven by a mere fascination with technologyand who have forgotten how to frame clear,important questions and design studies to an-swer them.

For instance, the example they give of re-gressing percentage of people over 70 years onpercentage of houses with proper sewage facil-ities is based on the problematic assumptionthat the populations and sewage disposal ofurban neighborhoods have been stable overtime. Older people may have grown up in thecountryside and only moved to those urban ar-eas as adults (poverty often being associatedwith old age): thus migration patterns may bethe major determinants of percentage of peo-ple over 70. Or increasing population densitiesmay have interacted with sewage disposalmethods to create problems over time; in thiscase it is most important to understand demo-graphic and sewage production and disposaldynamics of those urban neighborhoods overthe past seven decades. It seems to me that be-fore jumping into the computational tech-niques, the researchers need to propose a cleartheoretical framework and a biological and so-cially substantive logic which leads to specificquestions to be answered in the research.

A further concern I have with the focus onthese newer techniques of analyses is that re-searchers sometimes ignore the sources andquality of data, how they were collected, and

their real spatio-temporal distribution. Datacollected from referral hospital and health cen-ter records, based on diagnostic tests and ques-tionnaires with a wide range of sensitivities,specificities, and precision cannot simply belumped together with satellite data to producemeaningful information. Sometimes simplehand-drawn maps combined with intensivecommunity survey or focus group work may bewhat is needed most.

Health researchers are facing importantand often unprecedented questions in the 21stcentury. How can we create sustainably healthysocieties? What are the relationships betweeneconomic policy, environmental change, andhuman health? How might global warming af-fect changes in regional disease patterns? I haveno doubt that geocomputational techniquescan make important contributions to answer-ing these questions. The authors recognize that“the results are dependent on the basic assump-tions of the technique”, and that researchersshould use these techniques “with discretion,and always bearing in mind the conceptual ba-sis of each approach”. I only wish they hadspent a little more time and space exploringthose assumptions and concepts, to enablethose of us who are novices to more carefullyselect those techniques most suitable to thequestions we seek to answer.

McMICHAEL, A. J., 1999. Prisoners of the proximate:Loosening the constraints on epidemiology in anage of change. American Journal of Epidemiology,149:887-897.

ROSE, G., 1985. Sick individuals and sick populations.International Journal of Epidemiology, 14:32-38.

Programa de Engenharia Biomédica,Instituto Alberto Luiz Coimbra de Pós-Graduação ePesquisa de Engenharia,Universidade Federal do Rio de Janeiro,Rio de Janeiro, Brasil.

Flávio FonsecaNobre

I thoroughly enjoyed the paper by Drs. GilbertoCâmara and Antonio Miguel Viera Monteiroand hope that more researchers will be enticedby the main ideas presented above. I hope tosee a stronger cross-fertilization of this emerg-ing interdisciplinary field, connecting the useof so-called intelligent systems to spatial healthdata analysis.

The difficult task is to sum up and provide abrief discussion of this paper. Reading the firstpart I learned a term with which I had little orno familiarity – geocomputation – posited bythe authors as a new interdisciplinary field us-ing computer-intensive methods, including



neural networks, fuzzy logic, genetic algo-rithms, and cellular automata for spatial dataanalysis.

The study of spatial and spatial-temporalepidemiological data is a timely issue which isdriven by both decreasing technology costsand increasing availability of information. Forexample, it is becoming increasingly possibleto access georeferenced public health data in aspeedy manner through the Internet for ana-lyzing and merging with other information.Several models and methods to work with spa-tial health-related data have cropped up in theliterature in the last twenty years. Most of thesewere developed in other areas, like geostatis-tics, which originated in the mining industryand was later borrowed to help understand andexplain the spatial distribution of health events.As is common in many applied sciences, themethod is first introduced in an intuitive way,and once the heuristic results prove encourag-ing, there is major involvement by mathemati-cal and statistical theorists to get the techniquesoundly established. The wave of progress fol-lowing this pattern continues with Câmara andMonteiro’s paper, presenting a basic review ofexisting possibilities for the use of differentcomputing procedures to perform spatial healthdata analysis.

On the application side, I would partiallysupport the motivating statement of the paperciting Oppenshaw (1996), that “many end usersmerely want answers to fairly abstract ques-tions ...”. However, some care should be exer-cised here. Some twenty years ago I heard in aBrazilian workshop on statistical methods forepidemiologists, particularly on multiple re-gression, that the basic concepts are cumber-some and difficult to be understood by publichealth workers, and that they should be moreinvolved in collecting good data to be analyzedby the “foreigners”, i.e., specialists in statistics.Obviously, the authors of the paper would notwish us to merely engage in using these “blackbox” tools (which are well understood in theartificial intelligence community) but rather,that we begin close collaboration to both fur-ther the knowledge of these new methods andconvince ourselves that they could be includedin the analytical tool box of epidemiologistsand public health professionals.

The authors provide examples of real analy-ses in the hope of giving a genuine applied fla-vor to the methods reviewed. I wish to makesome comments on these applications. Thefirst concerns the use of the GAM (Geographi-cal Analysis Machine) to find clusters in datathat are originally areal data. Although the au-

thors emphasized that it is only an example,there is no mention of the large differences inarea sizes and population distribution in Rio deJaneiro’s districts, which I believe could sub-stantially influence the results. If one uses somesort of altered or transformed data set, onemust interpret it with caution and be certainthat the alteration is stated clearly to avoidmisuse by newcomers to this field of spatialanalysis research.

My other point concerns Section 3, on neur-al networks and geographical analysis, wherethe authors present a classification problem toproduce a map of environmental vulnerability.One of the most fundamental aspects of neuralnetwork modeling is the requirement of “plen-ty of training data”, which is properly identifiedin the paper. Neural networks are “adaptivecomputing” in that they learn from data tobuild a model. Therefore, the training data setshould contain all examples of possible sets ofexplanatory and outcome variables if one usesthe workhorse of neural network modeling: afeed forward network with a back propagationalgorithm. Users interested in applying thisnew technology should be aware of this impor-tant aspect. In addition, analysts must be will-ing to both tolerate the large amount of timefor training and have a “black box” modelwhich unfortunately does not provide the abil-ity to explain the reasoning used to arrive at aresult. This still limits the usefulness of thistechnique in some areas, particularly whenone is interested in measuring the effects of in-put variables rather than prediction.

Recent developments in computing perfor-mance have provided a wealth of opportunitiesfor advancement of new analytical approachesto spatial data analysis. These include the in-creasing use of Bayesian thinking, particular-ly with the introduction of the Monte CarloMarkov Chain (MCMC) approach to tackle in-tractable integrals. For the unfamiliar reader,the paper provides a brief introduction to vari-ous techniques. Some of these techniques werederived from the so-called intelligent systems,and it is the hope of the authors, and also mine,that they may assist our capability to convertdata into information.



Departamento de Estatística, UniversidadeFederal de Minas Gerais,Belo Horizonte, Brasil.

Renato MartinsAssunção

Câmara and Monteiro have the merit of draw-ing the attention of epidemiologists to newspatial analysis techniques. They have done agood job of summing up the main methodolo-gies recently developed and presenting exam-ples of their use, along with recent bibliograph-ical references. As a statistician, I wish to focusmy comments on the relationship between thearea referred to as geocomputation by the au-thors and the usual statistical methods.

Before emphasizing differences, it is neces-sary to identify commonality. A discipline ismerely a label for a set of knowledge and prac-tices exercised by people who use that label torefer to themselves. Such practices and knowl-edge change dynamically, and statistics is noexception. Currently, many of the methods de-scribed by the authors are found in the bestand most traditional statistics journals. In par-ticular, the first two topics, GAM (GeographicalAnalysis Machine) and local spatial statisticsare topics of articles and books by statisticiansinterested in spatial analysis. The other two,neural networks and cellular automata, are lesscommon but not totally absent. As the authorspoint out, the presence of these topics is due tothe current combined availability of data andcomputational power. I still do not feel com-fortable in identifying geocomputation as a de-fined field of work, since most of the tech-niques presented emerged in traditional con-texts (as in the case of the first two topics) ornon-geographical ones (the last two). But thisis not relevant for using and learning the tech-niques presented by the authors, which areuseful regardless of labels.

Although the latter two topics, neural net-works and cellular automata, are not absentfrom the statistical literature, they are less pre-sent than might be expected nowadays. Thus,one might ask, what can statisticians learnfrom researchers in neural networks and cellu-lar automata? I believe that we should be lessconcerned with asymptotic results and opti-mality and seek methods that function wellwith large databases. We should deal with largeand difficult problems involving a large num-ber of parameters and depending little on hy-potheses that cannot be verified. We should bealert to algorithms like the steepest descentwith learning rates that can be highly usefulin order to avoid over-adjustment in modelswith many parameters. This could be usefulmainly for Bayesian models, which have be-come increasingly important (Assunção et al.,in press).

On the other hand, what can researchers ofneural networks and cellular automata learnfrom statisticians? I believe they should be a lit-tle more concerned with the statistical proper-ties of their methods and perhaps slightly morewith their optimality. They should make greatereffort to compare their methods with others,including simpler traditional statistical meth-ods. Often a linear regression can have a per-formance similar to that of a multi-layer per-ceptron. Contrary to what the authors state atthe end of their article, the results of tech-niques like those presented are comparable. Aclear example of this is the book by Alexander& Boyle (1997), which presents various tech-niques, including GAM, used by their respec-tive creators in a set of simulated maps whichmight or might not present disease foci. Theprocess of generating maps was described tothe researchers, but not what each particularmap contained. This simulation exercise, al-though displaying limitations, served to clearlydemonstrate that some methods should beabandoned once and for all, and the GAM wasnot among them.

The techniques and examples presented inthe article are a good sample of what new com-puter-intensive methods can offer. Research onthese methods is increasing continuously andwill no doubt continue over this decade. Theauthors are to be congratulated for havingraised the topic and for having motivatedhealth researches to take interest in these newmethods.

ALEXANDER, F. E. & BOYLE, P., 1997. Methods for In-vestigating Localized Clustering of Disease. Ox-ford: Oxford University Press.

ASSUNÇÃO, R. M.; REIS, I. A. & OLIVEIRA, C. L., inpress. Diffusion and prediction of leishmaniosisin Belo Horizonte, a large Brazilian metropolitanarea: A Bayesian spatial-temporal model. Statis-tics in Medicine.



Núcleo de Estudos emSaúde Coletiva, Centrode Pesquisas Aggeu Magalhães, FundaçãoOswaldo Cruz, Recife,Brasil.

Maria de FátimaMilitão de Albuquerque

What do public health researchers expect of geocomputation?

The article by Gilberto Câmara and AntônioMiguel Vieira Monteiro is highly interestingand objective. While it introduces the conceptof geocomputation in a clear and didactic way,demonstrating its potential as a tool for analyz-ing spatial data, it also invites the reader to an-swer the question at the end with the sameclarity as the authors: what do we expect fromgeocomputation?

Epidemiology seeks to improve the meth-ods and techniques that allow it to describe,explain, and predict health and disease phe-nomena in populations, with a view towardsprevention. Therefore it plays a fundamentalrole in public health. From this perspective,the analysis of spatial distribution of diseaseshas contributed to the production of knowl-edge in the field and should not be seen as a“second-class” replacement for studies focus-ing on the individual as the unit of analysis(Susser, 1994a).

Depending on the problem one wishes tosolve, the ecological approach has its indica-tions and specificities. Thus, studies can focuson mapping the geographical distribution ofdiseases with the identification of spatial clus-ters of cases and the analysis of associationsbetween the incidence of diseases/events andenvironmental or contextual exposures relatedto the collective sphere.

How can geocomputation help improvesuch studies? We must first ask if we really un-derstand what is being offered to us.

Reading the article was certainly enlighten-ing, providing us with the scope of develop-ment of techniques and the analytical possibil-ities offered by the various methods. The au-thors facilitated an understanding of the con-cepts by giving a detailed development of thetheme through examples of health-related andenvironmental situations. For example, we areleft with the idea that the four types of ap-proaches presented by the authors have differ-ent premises and objectives but can be viewedas complementary.

Thus the use of GAM (the Geographic Analy-sis Machine) is capable of revealing clusters ofevents/diseases and constructing maps whenthe excess rates found are statistically signifi-cant. For example, this would be a useful tech-nique for detecting priority areas for publichealth interventions, and would not aim athelping explain the occurrence of phenomena.

Meanwhile, techniques for the detection of“spatial autocorrelation”, measured by theMoran coefficient or through semi-variograms,would detect dependence between geographi-cally proximate events, “explicitly consideringthe possible importance of their spatial arrange-ment in the analysis or interpretation of the re-sults” (Bailey & Gatrell, 1995). There are thusspecific indications for this type of research, forexample: when one’s point of departure is thehypothesis that the event at issue is generatedby environmental factors that are difficult todetect at the individual level.

The other two approaches described by theauthors involve more sophisticated techniques,incorporating functions aimed at contemplat-ing the complexity of the phenomena. The au-thors explain that an Artificial Neural Network(ANN) can be used as an exploratory tool in da-ta-rich environments and that it is capable ofintegrating different types of nature in a singlegeographic data base using Geographic Infor-mation Systems (GIS) technology. The infor-mation to be introduced into the model shouldbe chosen by the researcher, which obviouslypresupposes the existence of an underlyingtheoretical basis.

Meanwhile, cellular automata move evenfurther in the sense of incorporating dynamicelements into the models. These models “wouldfree us from static views of space” and wouldbe capable of representing the change in spaceover time as the product of human actions.

We may be closer to achieving the ambi-tious objective identified by Susser (1994b)(speaking of the logic of the ecological ap-proach): to understand how the context affectsthe health of individuals and groups. In otherwords, it appears increasingly possible to de-velop studies that reveal the effects not only ofthe structural elements of space but also thoseof the processes, not perceptible within thesphere of studies whose unit of analysis is theindividual. Hence epidemiology turned to crit-ical geography for the concept of “socially or-ganized space”.

Finally, the authors point out that compu-tational technology for solving health prob-lems should always be applied keeping in mindthe conceptual underpinnings of each ap-proach. This concern has its counterpart in thehealth field. The conceptual basis to be consid-ered in studies should be related to the theo-retical and methodological issues of publichealth. This underscores the need for an inter-disciplinary dialogue, where the respectivechallenge for the public health researcher is toguarantee the epidemiological content of the



studies, allowing for better knowledge of thetarget phenomenon, prediction of new occur-rences, and the organization of interventionsaimed at prevention.

BAILEY, T. C. & GATRELL, A., 1995. Interative SpatialData Analysis. Longman: Scientific & Technical.

SUSSER, M., 1994a. The logic in ecological I: The log-ic of analysis. American Journal of Public Health,84:825-829.

SUSSER, M., 1994b. The logic in ecological II: The log-ic of design. American Journal of Public Health,84:830-835.

Programa de Computação Científica,Fundação OswaldoCruz, Rio de Janeiro,Brasil.

Claudio J.Struchiner

First I would like to express my appreciation tothe authors for this impressively wide-rangingpaper. It is a review paper that provides an in-troduction to geocomputation techniques, i.e.,computer-intensive techniques for knowledgediscovery in physical and human geography.The authors seem to favor the view that thisnew interdisciplinary area is to be distin-guished from the simple extension of statisticaltechniques to spatial data. My comments aremotivated by questions I have posed to myselfafter reading their review: How do such meth-ods compare to established techniques? Whatare their advantages and disadvantages? Whatare their ranges of applications? Do the newtechniques challenge or extend any of the ex-isting paradigms in data analysis?

The computational dimension appears tobe the common denominator of the tech-niques described in this review and goes intothe definition of the key concept at stake, geo-computation. Faster and more powerful com-puters and advances in software engineeringhave had a profound impact on all areas of sta-tistics. Bootstrap and Monte Carlo Markov Chain(MCMC) methods, for example, allow the esti-mation of parameters in richer and more real-istic model-based representations of naturalphenomena, thereby freeing the imaginationof the scientific community. In this context, theboundaries of statistical models and statisticaltheory have been extended, while preservingthe current paradigms, i.e., good statisticalthinking is based on solid philosophical prin-ciples.

Algorithmic thinking also plays an impor-tant role in other areas of science. Complexsystems can be generated through the use of

very simple building rules, which resemble thefunctioning of DNA chains. In this context,computer-intensive algorithmic techniquesare intimately related to the mechanisms ofpattern formation that supposedly occur in na-ture. In opposition, the procedures under theheading of geocomputation also seek to uncov-er pattern formation, but their search mecha-nisms are general in nature and do not bearany relationship to the various possible mech-anisms that generate those spatial patterns.

In my view, the geocomputational methodsreviewed in this paper do not share the sameprinciples as these extensions. These algorith-mic techniques appear to be a computerizedversion similar, in spirit, to a once very fashion-able set of techniques developed by J. Tukeyand known as Exploratory Data Analysis. Otherstatistical techniques put together under dif-ferent headings such as Data-Driven Proce-dures and Data Mining attempt to answer sim-ilar questions raised here, i.e., “Are there anypatterns, what are they, and what do they looklike?”

The literature on quantitative methods hasacknowledged, at least since the beginning ofthis century, the existence of two dimensionsin research practice, i.e., exploratory versus an-alytical. For example, R. Ross opposed the con-cepts of a priori versus a posteriori pathometryin his Theory of Happenings. Most textbooksmake a distinction between descriptive andanalytical epidemiology. The debate seemsendless and can be naively put by such ques-tions as: “Are there purely descriptive studies?Without knowing what one is looking for, howcan one tell when one has found it? If there issome previous knowledge or intuition of a sub-ject, why not make it explicit in a model andsee how the available empirical evidence mod-ifies this knowledge or intuition? Do pattern-discovery algorithms carry some sort of built-in intelligence?”

Therefore, by analogy with other computer-intensive techniques mentioned above, onecould wonder whether geocomputation, andother modern exploratory data analysis tech-niques, could benefit from incorporating acausal structure or more specific pattern-for-mation mechanisms.



Departamento de Informação e Saúde,Centro de InformaçãoCientífica e Tecnológica,Fundação OswaldoCruz, Rio de Janeiro,Brasil.

Christovam Barcellos

The specificities of spatial health data analysis

The article by Gilberto Câmara and AntônioMiguel Monteiro describes various recently de-veloped spatial analysis techniques which havebeen applied mainly to environmental, geolog-ical, and land cover/land use problems, etc.Their use in the collective health field is stillnot very frequent and can present some analyt-ical limitations. I wish to touch on some ofthese problems based on the question con-tained in the title, i.e., questioning the speci-ficities of health data and problems as com-pared to other areas where these techniqueshave been applied.

In the first place, all health events – birth,infection, illness, death – manifest themselvesin persons. These individuals are not randomlydistributed in space. Thus, when one workswith health records to evaluate risks, one shouldestimate the probability of an event occurring,weighted by the population distribution. Themost common way to consider population dis-tribution in risk evaluation is to group demo-graphic and health data in watertight spatialunits and to subsequently calculate epidemio-logical indicators. This strategy poses seriouslimitations, such as ignoring interactions be-tween spatial units and the instability of indi-cators created in small areas (King, 1979). How-ever, this is not the only way to consider popu-lation distribution. For example, one can cal-culate case density (the number of cases perarea), producing a surface of probabilitieswhere areas with more proximate cases presentgreater risk. Analogously, one can calculate thedensity of persons (inhabitants per area, or sim-ply population density) as a continuous sur-face to be used as the denominator for calcu-lating rates. A third strategy to evaluate thespatial distribution of these events is to test therandomness of the “cases” in relation to a set of“controls” obtained by survey or drawing froma population with a similar profile. Populationdensity is always an implicit variable in all spa-tial analyses of health. However, this variable isnot neutral. At least in Brazil, it is associatedwith concentration of wealth and a particularway of life. This variable is the result of humancapacity, through the territorial division of la-bor, to produce surpluses and technology andto organize power structures. In addition, pop-ulation clustering can have important reper-cussions on the spread of diseases, especiallytransmissible ones. For example, the initialyears of the AIDS pandemic were characterized

by the rapid dissemination of the virus in largecities and by its spread through a downwardlyhierarchical network of cities. These cities, con-sidered “central”, concentrate people, income,and cases, as well as fostering an intense ex-change among individuals, a condition for HIVtransmission. Thus, the population of a givenplace is both the denominator for evaluatingrisks and one of the conditioning factors for thespread of diseases, which could be expressedmathematically as a differential equation.

In addition, the macro-determinants of dis-eases, whether environmental, social, or eco-nomic, occur “outside” of persons. It is inter-esting to note that the World Health Organiza-tion defines the environment as “the totality ofexternal elements that influence the healthconditions and quality of life of individuals orcommunities”. Therefore, if we intend to relatehealth problems to their determinants, weshould combine health data, referenced in thepopulation, to environmental data, referencedto something “external” to the population, witheach coming from different information sys-tems. Geographic Information Systems (GIS)can allow for this type of data relationship bysuperimposing layers of health event incidencerates on other layers relevant to this associa-tion (Vine et al., 1997).

Third, in Brazil, epidemiological data arecollected according to the territorial logic ofthe Unified Health System (SUS), with increas-ing hierarchical levels and primarily adminis-trative objectives. Thus, data location is con-ducted based on the spatial reference of theseunits, which display a wide variation in dimen-sions and resident populations. These dimen-sions, as well as the form of the reference spa-tial units, can have a major impact on the visu-al and statistical results. The GeographicalAnalysis Machine (GAM), for example, search-es for excess points in relation to an expectednumber within circles generated by the pro-gram. However, in various situations one shouldconsider non-circular risk locations, non-Eu-clidean distances between cases (and betweenthe latter and sources of risk), like the bandsaround power transmission lines, where expo-sure to low-frequency radiation can cause dam-age to human health. By selecting indicators,one should search for a territorial division thatmaximizes the variances of both exposure andthe measured effects on the population. Oneexplains – or makes explicit – the environmen-tal and social determinants on the scale inwhich the greatest variability in indicators isfound (Cleek, 1979). Form can be an importantfactor for constructing a risk model due to its



influence on the “exposure geometry”, studiedthrough landscape ecology (Frohn, 1998). Amore elongated unit can have more neighbors,while compact units have a smaller perimeterand can have less neighborhood relations withother units.

In general, studies in medical geographyhave been characterized by the search for ex-planatory factors for a given spatial distribu-tion of diseases, viewing space as an a posteri-ori factor. This approach can produce theoreti-cal simplifications through the association ofclimatic, cultural, and social characteristicswith epidemiological ones, which led a majorportion of studies by pioneers in medical geog-raphy to conclusions that ideologically rein-forced colonialism (Bennett, 1991). The use ofneural networks, as suggested by the authors,can reverse the direction of analyses of socio-spatial disease determinants, seeking combi-nations among factors, constituted a priori, toexplain this distribution. This approach re-quires that researchers formally present theirhypotheses and construct a series of “layers”representing human spaces and which, whencombined, best characterize the places wherethese diseases occur.

With the improvement of information sys-tems, the inclusion of addresses on healthrecords, and the growing use of satellite posi-tioning equipment or Global Positioning Sys-tems (GPS) in health surveillance activities,one can access these health events as points ona map with a local scale. The main advantageof the data georeferencing strategy is the possi-bility of producing different forms of data ag-gregation, constructing indicators in differentspatial units according to the study’s purpose.The same point (health event) can be containedin different types of spatial units: a neighbor-hood, a river basin, a health district, etc., de-fined by polygons on the maps. This character-istic incorporates the principles of simultane-ity and interaction between scales for spatialanalysis. This property also involves adopting ageometrical rigor that must be present in theplanning phase and construction of the map-ping base. In order for there to be a univocal re-lationship between the point and the polygon,the spatial units must cover the entire workingarea, and one area cannot be covered by morethan one polygon, i.e., there cannot be emptyplaces between units or overlapping of them.Each spatial unit represents a slice of space,containing populations at risk of diseases anddisplaying disease rates. Geographic Informa-tion Systems allow one to construct rates fordifferent exposure conditions by superimpos-

ing layers of disease data (points) and popula-tion data (polygons). These technical require-ments in the handling of both tabulated andmapped data hinder the adoption of less rigidcriteria for spatial studies, restricting the con-cept of space to watertight units. By using net-work analysis techniques, interpolation, andsmoothing of spatial data, one can dissolvepreviously established boundaries betweenspatial units. The adoption of fuzzy boundariesfor spatial units, ideal for studying place, isjeopardized by the operational norms of infor-mation systems (Oppenshaw, 1996).

Spatial analysis is defined as the capacity togenerate new information based on existingspatial data (Bailey, 1994). To this end, softwareapplications have been developed that facili-tate the search for patterns and exceptions inspace. Such techniques do not replace the re-searcher. Spatial analyses applied to health al-low one to study health problems where theymanifest themselves. Although this statementmay sound obvious, it is important to recallthat these analyses are only made possiblethrough the increasingly deep knowledge ofboth the health problem and the health place.

BAILEY, T. C., 1994. A review of statistical spatialanalysis in geographical information systems. In:Spatial Analysis and GIS (S. Fotheringham & P.Rogerson, ed.), pp. 13-44, London: Taylor andFrancis.

BENNETT, D., 1991. Explanation in medical geogra-phy. Evidence and epistemology. Social Scienceand Medicine, 33:339-346.

CLEEK, R. K., 1979. Cancer and the environment: Theeffect of scale. Social Science and Medicine, 13D:241-247.

FROHN, R. C., 1998. Remote Sensing for LandscapeEcology: New Metric Indicators for Monitoring,Modeling and Assessment of Ecosystems. Boca Ra-ton: CRC Press.

KING, P. E., 1979. Problems of spatial analysis in geo-graphical epidemiology. Social Science and Medi-cine, 13D:249-252.

OPPENSHAW, S., 1996. Fuzzy logic as a new scientificparadigm for doing geography. Environment andPlanning A, 28:761-768.

VINE, M. F.; DEGNAN, D. & HANCHETTE, C., 1997.Geographic information systems: Their use in en-vironmental epidemiologic research. Environ-mental Health Perspectives, 105:598-605.



The authors wish to thank the reviewers fortheir extensive and careful comments on ourpaper. We found that most of the remarks com-plement our work, either further developingcertain issues or elucidating differences in ourapproach to spatial statistical theory. We fullyconcur with Bailey’s remarks that geocomputa-tional techniques should be used to comple-ment more traditional statistical approaches,not as alternatives to them. Indeed, some ofthe techniques we presented (such as the GAMalgorithm) can be used as a priori data-miningtechniques to investigate data-rich environ-ments. After a significant pattern is found,model-fitting approaches can be applied moreeffectively. We take note of Bailey’s thoughtfulcomments on the dangers of applying tech-niques that are not easily associated with sta-tistical measures of sensitivity and robustness,such as neural network or cellular automata.

Although we understand the cautious ap-proach of many of our reviewers, we wish to in-dicate, as pointed out by both Nobre andStruchiner, that there are circumstances inwhich exploratory and non-robust geocompu-tational techniques are useful. Indeed, thereare cases where the statistical alternatives areeither extremely complex to apply or have yetto be fully developed. Let us consider two typesof problems: multi-dimensional spatial dataanalysis and dynamic spatio-temporal model-ing. As pointed out by Albuquerque, thesetypes of problems arise when we are interestedin studying not only the structural elements ofspace but also the effects of processes.

As for multidimensional spatial data analy-sis, our paper presents a typical situation inwhich a health researcher searches for areasprone to the incidence of a disease, given anumber of possible environmental factors.This problem can be described in generalterms as one of prediction, when it is assumedthat a causal structure is in place. We proposedto use neural networks as one of several possi-ble solutions to this problem. In his comments,Nobre made the important point that neuralnetworks are “black boxes” in the sense thatthey do not provide the ability to explain the

The authors replyOs autores respondem

Gilberto Câmara Antônio MiguelMonteiro

reasoning used to arrive at a result. Neverthe-less, they provide a viable practical alternativeto an otherwise difficult problem with tradi-tional techniques, since establishing a spatialcorrelation structure for such a problem mayprove almost intractable from a statisticalviewpoint.

In the case of dynamic spatio-temporalmodeling, a researcher may be interested inrepresenting geographical space in a detailedway, e.g., a matrix of cells where each cell hasunique characteristics. If one is interested inestablishing the conditions of disease propaga-tion in such an environment, the sheer size ofthe problem and the number of variables re-quired to provide a realistic prediction maymake the statistical approach unfeasible. Inthis case, approaches such as cellular automa-ta guided by econometric equations provide afirst approximation to an answer.

Assunção makes the important point thatad hoc techniques such as neural networks andcellular automata would benefit substantiallyfrom the use of statistical techniques for estab-lishing optimality properties and for a bettercharacterization of their variability and the rel-ative impact of each factor. We heartily agreeand certainly consider this an important re-search topic in geocomputation.

A final comment is in order. Geocomputa-tion will not provide any “silver bullets”. In-deed, some of its early proponents have in asense jeopardized its acceptance by the scien-tific community at large by making rather pre-posterous claims. For problems where cleanand robust statistical techniques are available,these should be used instead of ad hoc ap-proaches like neural networks or cellular au-tomata. However, we hope to have pointed outthat there are many situations in which currentstate-of-the-art statistical methods are not ap-plicable. In such cases, researchers should beencouraged to use the modeling capabilities ofcomputers and allow for different representa-tions of geographical space. It is inevitable andindeed desirable that such a pragmatic ap-proach will bring a new understanding to theanalysis of health data sets.

Documents

Geocomputation techniques for spatial analysis: are … · ing description is largely based on Turton (1998). GAM is basically a cluster finder for point or small-area data. Its purpose