In our previous article on the geodist command, we used that command to find the nearest location in stata to base city and locations within a certain radius. However geodist’s purpose is not for carrying out such computations. For such purposes, we have a dedicated command called geonear
, which will be the subject of this article.
The datasets used in this article are distance.dta and industrialcities.dta. Click on the relevant link to download them. The distance file contains data on the longitude and latitude data of various cities in the US.
Download Example Fileindustrialcities.dta has longitude and latitude data on four US industrial cities. The names of the variables in both files different, even though they represent the same information. For example city names are stored in a variable called ‘city’ in distance.dta but ‘city1’ in industrialcities.dta. This is an importance and necessary feature of the two datasets.
We start by using the first of the two datasets:
use "distance.dta",clear
For each of the 727 cities, we wish to know which among the four industrial cities is the nearest to them. For this, we use the geonear
command.
General Syntax
The general syntax of the geonear
command is as follows:
geonear baseid baselat baselon using nborfile, neighbors(nborid nborlat nborlon) [options]
baseid
baslat
and baselon
refer to the city, latitude and longitude variables in our masterfile (i.e. the file currently loaded: distance.dta) respectively. nborfile
refers to the second, neighbor file (aka the ‘using’ file’: industrialcities.dta). The option of neighbors()
is mandatory. In its parenthesis, we specify the city, latitude and longitude variables from the using file (in the specific order identified in the syntax).
Find nearest location in stata in kilometres
To apply the above syntax to the datasets, our command looks like:
geonear city latitude longitude using "industrialcities.dta", n(city1 latitude1 longitude1)
The first three arguments in the command are the city, latitude and longitude variable names in the master file. This is followed by specifying the dataset that we will use. The option n
is a shorter way of writing the neighbors option. Its parenthesis contain the city, latitude and longitude data from the neighbor file.
This command creates a variable that holds the industrial city names that are nearest to each of the 727 cities. It creates another variable that shows the distance between the city and its nearest industrial city (the neighbor).
With geodist
we saw that the distance of a city with itself was, rightly, calculated as zero. We do not want such observations (where a city is paired up with itself) in our data. While previously, we had to drop such observation manually, here we simply use the ignoreself
option to automatically remove them.
geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1)
Calculating Distance in Miles
If we wish for the distance to the nearest city to be reported in miles, we add a miles
option.
geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1) miles genstub(ignore)
In addition to that, we have also added an option of genstub()
. This allows us to add a prefix – which we specify in the parenthesis – to the nearest neighbor variable generated after the command. In our example, we use the prefix ‘ignore’.
Switching Datasets
We now use industrialcities.dta as our master data.
use "industrialcities.dta",clear
We now want to find the two nearest cities (from the 727 cities in the using/neighbor file) to the four cities in our master dataset. The command goes:
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2)
nearcount(2)
indicates that we are looking for the two nearest cities. For example, the first nearest city to Boston is Brookly, while the second nearest is Somerville.
Formatting Data As Long
Stata adds columns of the nearest city names and distance when we use the geonear
command by default. If we want this layout to be long, the option long
is specified
use "industrialcities.dta",clear geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2) long
Note the industrialcities.data data had been reloaded in the stata memory using the use
command
Cities Within a Radius
Previously, with the geodist
command, we had to follow several steps to obtain the cities that were within a 30km radius of the base cities. Now, we only need to use an option of within()
with the geonear
command.
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20)
By default, Stata will interpret within(20)
as indicating cities within 20km. If we were to add a miles
option, this would be interpreted as 20 miles.
Some base cities, Houston and Iowa City here, do not have any neighbor within 20km, but Stata still returns their nearest neighbor which is beyond the 20km we are looking within. To ensure only those cities within the specified radius are reported, we add the nearcount(0)
option alongside the within()
option. This omits cities that have no neighbor within the radius given in within(20)
.
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) nearcount(0)
This command reports all the cities within the 20km radius (13 cities for Boston). If we only want the two nearest cities within a 20km radius to be reported, we add the limit()
option with the number of cities we want the output to be limited to within the parenthesis.
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) limit(2)
The number of nearest cities within the 20km radius is now restricted to two.
Ellipsoidal Distances
While geodist
calculates ellipsoidal distances by default, geonear
uses spherical distances. You will notice that distances calculated by both commands are different and therefore not comparable. To make the geonear
command calculate ellipsoidal distances, we use the ellipsoid
option:
geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself ellipsoid
The distances will now be calculated using the WGS 1984 reference ellipsoid.
Finding Neighbors From Within One Dataset
We go back to using distance.dta.
use "distance.dta",clear
What if we wanted to find the nearest city from within this one dataset? Which of the 727 cities in the dataset is closest to, for example, Dallas apart from itself?
To find out, we simply write the master file’s name in place of the using/neighbor file.
geonear city latitude longitude using "distance.dta", n(city latitude longitude) ignoreself
Each city’s observation will now be compared against all other cities’ observations to find the one nearest to it.
Click here for more details on geonear command.