Sunday 3 May 2015

17 Plotting Addresses on Google Maps using R and R GoogleMaps

In an earlier post, we had shown how locations as defined by addresses could be plotted on Google Maps but the process was rather complicated. First, the geocoding had to be done separately and secondly a lot of messy Javascript had to be coded by hand. Finally the map that was produced could only be viewed with an internet connection.

All these problems can now be overcome by using Rgooglemap package that is available from the R CRAN repository and so the process has become very simple. Finally, the map is generated like any R plot and can be saved as a PNG or PDF file for offline viewing.

Please see the code here :
#
# R program to show specific addresses on a Google Map
#

# the correct version was not available on CRAN
install.packages("/home/hduser/Downloads/rjson_0.2.13.tar.gz",repos=NULL, type="source")


setwd("/home/xxxx/xxx/maps")
library(rjson)
library(ggmap)
library(RgoogleMaps)
library(png)

#
# input data scraped off the web by running the python program
# https://github.com/prithwis/WebScraper/blob/master/SchoolDataScraper0.py
# the actual tsv file used in this exercise can be downloaded at
# https://github.com/prithwis/WebScraper/blob/master/CalcuttaSchools.tsv
#

Schools = read.csv(file="CalcuttaSchools.tsv",head=FALSE, sep="\t")

# since there is a limit on the number of geocoding requests that can be made, we work with only 5 schools
Schools = Schools[sample(1:nrow(Schools), 5, replace=FALSE),]
colnames(Schools) = c("Name", "Address")

# Lat, Lon is extracted along with address as understood by Google
GeoLocations = geocode(as.character(Schools$Address),output ='latlona')
MapData = cbind(Schools,GeoLocations)
names(MapData)[5] = "GooglePlace"
MapData = MapData[c("Name","lon","lat","Address","GooglePlace")]

# sanity check whether Address is similar to GooglePlace. If different, possible geolocation error
print(MapData[c("Address","GooglePlace","Name")])

# --------------------------------------------------------------------------

# Map is defined in terms of centre and zoom level
cent2 = c(mean(MapData$lat), mean(MapData$lon))
zoom2 = min(MaxZoom(range(MapData$lat), range(MapData$lon)))

# first get the map from Google as a png file
SchoolMap = GetMap(center = cent2, zoom = zoom2, destfile = "MapSchools.png", maptype = "map")
imgSchoolMap = readPNG("MapSchools.png")
grid::grid.raster(imgSchoolMap)

# Define set of long, lat to be plotted on map
LatSet = MapData$lat
LonSet = MapData$lon

# Plot points on the map
# to change plot symbols look at http://www.statmethods.net/advgraphs/parameters.html
PlotOnStaticMap(SchoolMap,lat = LatSet, lon = LonSet, cex = 0.7, pch = 6, col = "red", FUN = points, NEWMAP = TRUE)

# Name of the school, truncated to first 4 char, will be used as identify the points
NameSet = substr(as.character(MapData$Name),1,4)

# Location where name is printed, slightly different from the point plotted
LonOffSet2 = 0.005+LonSet

# Write names
PlotOnStaticMap(SchoolMap,lat = LatSet, lon = LonOffSet2, cex = 0.7, labels= NameSet, col = "black", FUN = text, add = T)

Couple of observations :

  1. The input to the program is a TSV file containing the names and addresses of 93 Schools in Calcutta. The TSV format is used because addresses typically contain "," and this can impact the reading process. The actual file used in this demo can be downloaded from github.
  2. This input file has been created with a Python program that has been used to "scrape" data from the a specific website. This Python program is also available in github.
  3. Google sets some limits on the number of geocoding requests that can be sent. So during the testing process, we take a random sample of 5 schools from the list of school addresses that we have downloaded.
  4. Finally, please note that the Google geocoding process is not totally reliable for addresses in India. Given the variety of address format, sometimes the Lat/Long retrieved is erroneous and the Calcutta schools can be placed in Iran or Mozambique! Or even in other locations in Calcutta, or West Bengal. Such things happen about 10% of the time. To spot and eliminate such obvious errors, we take a printout of the address supplied and the address generated by Google and compare the same. If they show significant differences it is best to remove this data or hand code the Lat / Long
  5. Finally putting text into a map is always tricky because text labels can overlap and cause a mess. In such cases it is far simpler to avoid text labels in R. Once the PNG file is generated, it is very easy to put in the text using any image editing software like Gimp or PhotoShop
Here are two maps generated by this program





3 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. While exploring this Google Maps exercise with "CalcuttaSchools" Dataset I am facing a problem with geocode() function.

    "GeoLocations = geocode(as.character(Schools$Address),output ='latlona')"

    It doesn't show any error but warnings are generated : "geocoding failed for "27/ B Park Street,Kolkata,India,700071"......"

    As a result, the geo-location parameter like "lon","lat","GooglePlace" are not set for each address.

    Code snippet :-
    ================================

    library(rjson)
    library(ggmap)
    library(RgoogleMaps)
    library(png)


    Schools = read.csv(file="Datasets/CalcuttaSchools.tsv",head=FALSE, sep="\t")
    Schools = Schools[sample(1:nrow(Schools), 10, replace=FALSE),]
    colnames(Schools) = c("Name", "Address")

    GeoLocations = geocode(as.character(Schools$Address),output ='latlona')

    ============================================================

    Please suggest me.I am using "rjson_0.2.15" version of rjson.

    ReplyDelete
  3. Click on the Content tab in to sidebar and then click on Site Content and All Pages
    When you have done this you should be able to view all the stats about your site including which pages are the most visitedgoogle dashboard

    ReplyDelete