...

How to Find Nearest Location in Stata Using geonear

In our previous article on the geodist command, we used that command to find the nearest location in stata to base city and locations within a certain radius. However geodist’s purpose is not for carrying out such computations. For such purposes, we have a dedicated command called geonear, which will be the subject of this article.

The datasets used in this article are distance.dta and industrialcities.dta. Click on the relevant link to download them. The distance file contains data on the longitude and latitude data of various cities in the US.

Download Example File

industrialcities.dta has longitude and latitude data on four US industrial cities. The names of the variables in both files different, even though they represent the same information. For example city names are stored in a variable called ‘city’ in distance.dta but ‘city1’ in industrialcities.dta. This is an importance and necessary feature of the two datasets.

We start by using the first of the two datasets:

use "distance.dta",clear

For each of the 727 cities, we wish to know which among the four industrial cities is the nearest to them. For this, we use the geonear command.

General Syntax

The general syntax of the geonear command is as follows:

geonear baseid baselat baselon using nborfile, neighbors(nborid nborlat nborlon) [options]

baseid baslat and baselon refer to the city, latitude and longitude variables in our masterfile (i.e. the file currently loaded: distance.dta) respectively. nborfile refers to the second, neighbor file (aka the ‘using’ file’: industrialcities.dta). The option of neighbors() is mandatory. In its parenthesis, we specify the city, latitude and longitude variables from the using file (in the specific order identified in the syntax).

Find nearest location in stata in kilometres

To apply the above syntax to the datasets, our command looks like:

geonear city latitude longitude using "industrialcities.dta", n(city1 latitude1 longitude1)
find nearest cities in stata

The first three arguments in the command are the city, latitude and longitude variable names in the master file. This is followed by specifying the dataset that we will use. The option n is a shorter way of writing the neighbors option. Its parenthesis contain the city, latitude and longitude data from the neighbor file.

This command creates a variable that holds the industrial city names that are nearest to each of the 727 cities. It creates another variable that shows the distance between the city and its nearest industrial city (the neighbor).

With geodist we saw that the distance of a city with itself was, rightly, calculated as zero. We do not want such observations (where a city is paired up with itself) in our data. While previously, we had to drop such observation manually, here we simply use the ignoreself option to automatically remove them. 

ignoreself option in geonear
geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1) 

Calculating Distance in Miles

If we wish for the distance to the nearest city to be reported in miles, we add a miles option.

geonear city latitude longitude using "industrialcities.dta", ignoreself n(city1 latitude1 longitude1) miles genstub(ignore)
calculate distance in miles in stata

In addition to that, we have also added an option of genstub(). This allows us to add a prefix – which we specify in the parenthesis – to the nearest neighbor variable generated after the command. In our example, we use the prefix ‘ignore’.

Switching Datasets

We now use industrialcities.dta as our master data.

use "industrialcities.dta",clear
industrial cities dataset in stata

We now want to find the two nearest cities (from the 727 cities in the using/neighbor file) to the four cities in our master dataset. The command goes:

geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2)
nearest cities to our base city

nearcount(2) indicates that we are looking for the two nearest cities. For example, the first nearest city to Boston is Brookly, while the second nearest is Somerville.

Formatting Data As Long

Stata adds columns of the nearest city names and distance when we use the geonear command by default. If we want this layout to be long, the option long is specified

use "industrialcities.dta",clear geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself nearcount(2) long

Note the industrialcities.data data had been reloaded in the stata memory using the use command

nearest cities in long format of geonear

Cities Within a Radius

Previously, with the geodist command, we had to follow several steps to obtain the cities that were within a 30km radius of the base cities. Now, we only need to use an option of within() with the geonear command.

geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20)
cities within specific radius

By default, Stata will interpret within(20) as indicating cities within 20km. If we were to add a miles option, this would be interpreted as 20 miles.

Some base cities, Houston and Iowa City here, do not have any neighbor within 20km, but Stata still returns their nearest neighbor which is beyond the 20km we are looking within. To ensure only those cities within the specified radius are reported, we add the nearcount(0) option alongside the within() option. This omits cities that have no neighbor within the radius given in within(20).

geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) nearcount(0)

This command reports all the cities within the 20km radius (13 cities for Boston). If we only want the two nearest cities within a 20km radius to be reported, we add the limit() option with the number of cities we want the output to be limited to within the parenthesis.

geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself long within(20) limit(2)
cities within specific distance

The number of nearest cities within the 20km radius is now restricted to two.

Ellipsoidal Distances

While geodist calculates ellipsoidal distances by default, geonear uses spherical distances. You will notice that distances calculated by both commands are different and therefore not comparable. To make the geonear command calculate ellipsoidal distances, we use the ellipsoid option:

geonear city1 latitude1 longitude1 using "distance.dta", n(city latitude longitude) ignoreself ellipsoid

The distances will now be calculated using the WGS 1984 reference ellipsoid.

Finding Neighbors From Within One Dataset

We go back to using distance.dta.

use "distance.dta",clear

What if we wanted to find the nearest city from within this one dataset? Which of the 727 cities in the dataset is closest to, for example, Dallas apart from itself?

To find out, we simply write the master file’s name in place of the using/neighbor file.

geonear city latitude longitude using "distance.dta", n(city latitude longitude) ignoreself
nearest cities

Each city’s observation will now be compared against all other cities’ observations to find the one nearest to it.

Click here for more details on geonear command.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x
Tweet
Share
Share
Pin