In our previous article on the geodist command, we used that command to find the nearest location in stata to base city and locations within a certain radius. However geodist’s purpose is not for carrying out such computations. For such purposes, we have a dedicated command called
geonear, which will be the subject of this article.
The datasets used in this article are distance.dta and industrialcities.dta. Click on the relevant link to download them. The distance file contains data on the longitude and latitude data of various cities in the US.
industrialcities.dta has longitude and latitude data on four US industrial cities. The names of the variables in both files different, even though they represent the same information. For example city names are stored in a variable called ‘city’ in distance.dta but ‘city1’ in industrialcities.dta. This is an importance and necessary feature of the two datasets.
We start by using the first of the two datasets:
For each of the 727 cities, we wish to know which among the four industrial cities is the nearest to them. For this, we use the
The general syntax of the
geonear command is as follows:
geonear baseid baselat baselon using nborfile, neighbors(nborid nborlat nborlon) [options]
baselon refer to the city, latitude and longitude variables in our masterfile (i.e. the file currently loaded: distance.dta) respectively.
nborfile refers to the second, neighbor file (aka the ‘using’ file’: industrialcities.dta). The option of
neighbors() is mandatory. In its parenthesis, we specify the city, latitude and longitude variables from the using file (in the specific order identified in the syntax).
Find nearest location in stata in kilometres
To apply the above syntax to the datasets, our command looks like:
geonear city latitude longitude using "industrialcities.dta", n(city1 latitude1 longitude1)
The first three arguments in the command are the city, latitude and longitude variable names in the master file. This is followed by specifying the dataset that we will use. The option
n is a shorter way of writing the neighbors option. Its parenthesis contain the city, latitude and longitude data from the neighbor file.
This command creates a variable that holds the industrial city names that are nearest to each of the 727 cities. It creates another variable that shows the distance between the city and its nearest industrial city (the neighbor).
geodist we saw that the distance of a city with itself was, rightly, calculated as zero. We do not want such observations (where a city is paired up with itself) in our data. While previously, we had to drop such observation manually, here we simply use the
ignoreself option to automatically remove them.
geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1)
Calculating Distance in Miles
If we wish for the distance to the nearest city to be reported in miles, we add a
geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1) miles genstub(ignore)
In addition to that, we have also added an option of
genstub(). This allows us to add a prefix – which we specify in the parenthesis – to the nearest neighbor variable generated after the command. In our example, we use the prefix ‘ignore’.
We now use industrialcities.dta as our master data.
We now want to find the two nearest cities (from the 727 cities in the using/neighbor file) to the four cities in our master dataset. The command goes:
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2)
nearcount(2) indicates that we are looking for the two nearest cities. For example, the first nearest city to Boston is Brookly, while the second nearest is Somerville.
Formatting Data As Long
Stata adds columns of the nearest city names and distance when we use the
geonear command by default. If we want this layout to be long, the option
long is specified
use "industrialcities.dta",clear geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2) long
Note the industrialcities.data data had been reloaded in the stata memory using the
Cities Within a Radius
Previously, with the
geodist command, we had to follow several steps to obtain the cities that were within a 30km radius of the base cities. Now, we only need to use an option of
within() with the
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20)
By default, Stata will interpret
within(20) as indicating cities within 20km. If we were to add a
miles option, this would be interpreted as 20 miles.
Some base cities, Houston and Iowa City here, do not have any neighbor within 20km, but Stata still returns their nearest neighbor which is beyond the 20km we are looking within. To ensure only those cities within the specified radius are reported, we add the
nearcount(0) option alongside the
within() option. This omits cities that have no neighbor within the radius given in
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) nearcount(0)
This command reports all the cities within the 20km radius (13 cities for Boston). If we only want the two nearest cities within a 20km radius to be reported, we add the
limit() option with the number of cities we want the output to be limited to within the parenthesis.
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) limit(2)
The number of nearest cities within the 20km radius is now restricted to two.
geodist calculates ellipsoidal distances by default,
geonear uses spherical distances. You will notice that distances calculated by both commands are different and therefore not comparable. To make the
geonear command calculate ellipsoidal distances, we use the
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself ellipsoid
The distances will now be calculated using the WGS 1984 reference ellipsoid.
Finding Neighbors From Within One Dataset
We go back to using distance.dta.
What if we wanted to find the nearest city from within this one dataset? Which of the 727 cities in the dataset is closest to, for example, Dallas apart from itself?
To find out, we simply write the master file’s name in place of the using/neighbor file.
geonear city latitude longitude using "distance.dta", n(city latitude longitude) ignoreself
Each city’s observation will now be compared against all other cities’ observations to find the one nearest to it.
Click here for more details on geonear command.