Thursday, 28 May 2015

18. googleVis - The best of R and GoogleCharts

In a couple of earlier posts, we have seen how Google Charts can be used to generate online charts and graphs but the challenge was that were was a lot of rather messy javascript to play around with. All such mess has been rendered redundant with the R package googleVis and it is now possible to generate charts with the Google Charts API using only R.

The resultant charts need to be viewed in a browser -- and the html code is available for embedding in websites like this. This google docs spreadsheet has some data on milk production in Indian states. It has been downloaded and used for these exercises

Here is the code :


usePackage <- function(p) {
  if (!is.element(p, installed.packages()[,1]))
    install.packages(p, dep = TRUE)
  require(p, character.only = TRUE)
}

usePackage("googleVis")

StateMilk <- read.csv(file = "MilkState.tsv", head=TRUE, sep ='\t')
head(StateMilk)
StateMilkTotal <- StateMilk[, c("State","Total")]
StateName <- StateMilk[, "State"]
head(StateName)
head(StateMilkTotal)

GeoStates <- gvisGeoChart(StateMilk, "State", "Total", options=list(region="IN", displayMode="markers", resolution="provinces", width=1200, height=800))
plot(GeoStates)
GeoStates2 <- gvisGeoChart(StateMilk, "State", "Total", options=list(region="IN", displayMode="regions", resolution="provinces", width=600, height=400))
plot(GeoStates2)
print(GeoStates2)


and here is the chart that is generated by code printed out by the last print command
-----------------
Data: StateMilk • Chart ID: GeoChartIDe315e03b5c4googleVis-0.5.8
R version 3.0.2 (2013-09-25) • Google Terms of UseDocumentation and Data Policy
-----------------

In addition to maps, you can also draw bar and column charts like with this code

MilkBar <- gvisBarChart(StateMilk, xvar="State", yvar=c("CowMilkTotal", "BuffaloMilk"),options = list(width=800, height=900))
plot(MilkBar)

MilkColumn <- gvisColumnChart(StateMilk, xvar="State", yvar=c("CowMilkTotal", "BuffaloMilk"), options = list(width=800, height=900) )
plot(MilkColumn)


A complete list of sample code for all possible types of googleVis charts is available at on this page. All the examples can be executed by running the demo program as explained at the top of the reference page.

Sunday, 3 May 2015

17 Plotting Addresses on Google Maps using R and R GoogleMaps

In an earlier post, we had shown how locations as defined by addresses could be plotted on Google Maps but the process was rather complicated. First, the geocoding had to be done separately and secondly a lot of messy Javascript had to be coded by hand. Finally the map that was produced could only be viewed with an internet connection.

All these problems can now be overcome by using Rgooglemap package that is available from the R CRAN repository and so the process has become very simple. Finally, the map is generated like any R plot and can be saved as a PNG or PDF file for offline viewing.

Please see the code here :
#
# R program to show specific addresses on a Google Map
#

# the correct version was not available on CRAN
install.packages("/home/hduser/Downloads/rjson_0.2.13.tar.gz",repos=NULL, type="source")


setwd("/home/xxxx/xxx/maps")
library(rjson)
library(ggmap)
library(RgoogleMaps)
library(png)

#
# input data scraped off the web by running the python program
# https://github.com/prithwis/WebScraper/blob/master/SchoolDataScraper0.py
# the actual tsv file used in this exercise can be downloaded at
# https://github.com/prithwis/WebScraper/blob/master/CalcuttaSchools.tsv
#

Schools = read.csv(file="CalcuttaSchools.tsv",head=FALSE, sep="\t")

# since there is a limit on the number of geocoding requests that can be made, we work with only 5 schools
Schools = Schools[sample(1:nrow(Schools), 5, replace=FALSE),]
colnames(Schools) = c("Name", "Address")

# Lat, Lon is extracted along with address as understood by Google
GeoLocations = geocode(as.character(Schools$Address),output ='latlona')
MapData = cbind(Schools,GeoLocations)
names(MapData)[5] = "GooglePlace"
MapData = MapData[c("Name","lon","lat","Address","GooglePlace")]

# sanity check whether Address is similar to GooglePlace. If different, possible geolocation error
print(MapData[c("Address","GooglePlace","Name")])

# --------------------------------------------------------------------------

# Map is defined in terms of centre and zoom level
cent2 = c(mean(MapData$lat), mean(MapData$lon))
zoom2 = min(MaxZoom(range(MapData$lat), range(MapData$lon)))

# first get the map from Google as a png file
SchoolMap = GetMap(center = cent2, zoom = zoom2, destfile = "MapSchools.png", maptype = "map")
imgSchoolMap = readPNG("MapSchools.png")
grid::grid.raster(imgSchoolMap)

# Define set of long, lat to be plotted on map
LatSet = MapData$lat
LonSet = MapData$lon

# Plot points on the map
# to change plot symbols look at http://www.statmethods.net/advgraphs/parameters.html
PlotOnStaticMap(SchoolMap,lat = LatSet, lon = LonSet, cex = 0.7, pch = 6, col = "red", FUN = points, NEWMAP = TRUE)

# Name of the school, truncated to first 4 char, will be used as identify the points
NameSet = substr(as.character(MapData$Name),1,4)

# Location where name is printed, slightly different from the point plotted
LonOffSet2 = 0.005+LonSet

# Write names
PlotOnStaticMap(SchoolMap,lat = LatSet, lon = LonOffSet2, cex = 0.7, labels= NameSet, col = "black", FUN = text, add = T)

Couple of observations :

  1. The input to the program is a TSV file containing the names and addresses of 93 Schools in Calcutta. The TSV format is used because addresses typically contain "," and this can impact the reading process. The actual file used in this demo can be downloaded from github.
  2. This input file has been created with a Python program that has been used to "scrape" data from the a specific website. This Python program is also available in github.
  3. Google sets some limits on the number of geocoding requests that can be sent. So during the testing process, we take a random sample of 5 schools from the list of school addresses that we have downloaded.
  4. Finally, please note that the Google geocoding process is not totally reliable for addresses in India. Given the variety of address format, sometimes the Lat/Long retrieved is erroneous and the Calcutta schools can be placed in Iran or Mozambique! Or even in other locations in Calcutta, or West Bengal. Such things happen about 10% of the time. To spot and eliminate such obvious errors, we take a printout of the address supplied and the address generated by Google and compare the same. If they show significant differences it is best to remove this data or hand code the Lat / Long
  5. Finally putting text into a map is always tricky because text labels can overlap and cause a mess. In such cases it is far simpler to avoid text labels in R. Once the PNG file is generated, it is very easy to put in the text using any image editing software like Gimp or PhotoShop
Here are two maps generated by this program





Saturday, 2 May 2015

16. Using R for Maps of India - state, district, taluka level maps

Displaying spatial data on maps is always interesting but most Visualisation tools do not offer facilities to create maps of India, especially at the state and lower levels. In this post, we will show how such maps can be made.

The base data for such maps, the "polygons" that define the country, the states, the districts and even the talukas ( or sub-divisions) is available from an organisation called Global Administrative Areas or gadm.org. Country level files for almost all countries are available in a variety of formats including R and these are at three different levels. For India, these files can be downloaded as IND_admN.RData where N = 1,2,3. These will form the raw data from which we will create our maps.

Working with R, we will need two R packages :

# Load required libraries
library(sp)
library(RColorBrewer)

Assuming that the downloaded RData file is located in the R working directory, the following code will generate a basic India showing the states

# load level 1 india data downloaded from http://gadm.org/country
load("IND_adm1.RData")
ind1 = gadm

# simple map of India with states drawn
# unfortunately, Kashmir will get truncated
spplot(ind1, "NAME_1", scales=list(draw=T), colorkey=F, main="India")
Now suppose there is some data ( economic, demographic or whatever ...) and we wish to colour each state with a colour that represents this data. We simulate this scenario by assigning a random number ( between 0 and 1) to each state and then defining the RGB colour of this region with a very simple function that converts the data into a colour value. [ This idea borrowed from gis.stackexchange ]

# map of India with states coloured with an arbitrary fake data
ind1$NAME_1 = as.factor(ind1$NAME_1)
ind1$fake.data = runif(length(ind1$NAME_1))
spplot(ind1,"NAME_1",  col.regions=rgb(0,ind1$fake.data,0), colorkey=T, main="Indian States")

Now let us draw the map of any one state. First check the spelling of each state by listing the states:
ind1$NAME_1

and then executing these commands :
# map of West Bengal ( or any other state )
wb1 = (ind1[ind1$NAME_1=="West Bengal",])
spplot(wb1,"NAME_1", col.regions=rgb(0,0,1), main = "West Bengal, India",scales=list(draw=T), colorkey =F)

# map of Karnataka ( or any other state )
kt1 = (ind1[ind1$NAME_1=="Karnataka",])
spplot(kt1,"NAME_1", col.regions=rgb(0,1,0), main = "Karnataka, India",scales=list(draw=T), colorkey =F)


If we want to get and map district level data then we need to use the level 2 data as follows :

# load level 2 india data downloaded from http://gadm.org/country
load("IND_adm2.RData")
ind2 = gadm

and then plot the various districts as

# plotting districts of a State, in this case West Bengal
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
spplot(wb2,"NAME_1", main = "West Bengal Districts", colorkey =F)


To identify each district with a beautiful colour we can use the following commands :
# colouring the districts with rainbow of colours
wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col, colorkey=T)

As in the case of the states, we can assume that each district has some (economic or demographic) data and we wish to colour the districts according to the intensity of this data, then we can use the following code :

# colouring the districts with some simulated, fake data
wb2$NAME_2 = as.factor(wb2$NAME_2)
wb2$fake.data = runif(length(wb2$NAME_1)) 
spplot(wb2,"NAME_2",  col.regions=rgb(0,wb2$fake.data, 0), colorkey=T)


But we can be even more clever by allocating certain shades of colour to certain ranges of data as with this code, adapted from this website

# colouring the districts with range of colours
col_no = as.factor(as.numeric(cut(wb2$fake.data, c(0,0.2,0.4,0.6,0.8,1))))
levels(col_no) = c("<20%", "20-40%", "40-60%","60-80%", ">80%")
wb2$col_no = col_no
myPalette = brewer.pal(5,"Greens")
spplot(wb2, "col_no", col=grey(.9), col.regions=myPalette, main="District Wise Data")


To move to the district, sub-division ( or taluk) level we need to use the level three data file :

# load level 3 india data downloaded from http://gadm.org/country
load("IND_adm3.RData")
ind3 = gadm

# extracting data for West Bengal
wb3 = (ind3[ind3$NAME_1=="West Bengal",])

and then plot the subdivision or taluk level map as follows :

#plotting districts and sub-divisions / taluk
wb3$NAME_3 = as.factor(wb3$NAME_3)
col = rainbow(length(levels(wb3$NAME_3)))
spplot(wb3,"NAME_3", main = "Taluk, District - West Bengal", colorkey=T,col.regions=col,scales=list(draw=T))


Now let us get a map of the district - North 24 Parganas. Make sure that the name is spelt correctly.

# get map for "North 24 Parganas District"
wb3 = (ind3[ind3$NAME_1=="West Bengal",])
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
spplot(n24pgns3,"NAME_3", colorkey =F, scales=list(draw=T), main = "24 Pgns (N) West Bengal")

and within North 24 Parganas district, we can go down to the Basirhat Subdivision ( Taluk) and draw the map as follows: 

# now draw the map of Basirhat subdivision
# recreate North 24 Parganas data
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
basirhat3 = (n24pgns3[n24pgns3$NAME_3=="Basirhat",])
spplot(basirhat3,"NAME_3", colorkey =F, scales=list(draw=T), main = "Basirhat,24 Pgns (N) West Bengal")


This is the highest resolution ( or lowest administrative division ) that we can go with data from gadm. However even within a map,  one "zoom" into and enlarge an area by specifying the latitude and longitudes of a zoom box as shown here.

# zoomed in data
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col,scales=list(draw=T),ylim=c(23.5,25),xlim=c(87,89), colorkey=T)



With this it should be possible to draw any map of India. For more comprehensive examples of such maps, please see this page.


Here is the entire code

========================================
setwd("/home/xxx/yyy/maps")

# http://r-nold.blogspot.in/2012/08/provincial-map-using-gadm.html
# http://blog.revolutionanalytics.com/2009/10/geographic-maps-in-r.html
# http://gis.stackexchange.com/questions/80565/plotting-a-map-of-new-zealand-with-regional-boundaries-in-r
# https://ryouready.wordpress.com/2009/11/16/infomaps-using-r-visualizing-german-unemployment-rates-by-color-on-a-map/
# http://rstudio-pubs-static.s3.amazonaws.com/6772_441847b522584d1095daddc2677e4ddb.html -- comprehensive 

# Load required libraries
library(sp)
library(RColorBrewer)

# load level 1 india data downloaded from http://gadm.org/country
load("IND_adm1.RData")
ind1 = gadm

# simple map of India with states drawn
# unfortunately, Kashmir will get truncated 
spplot(ind1, "NAME_1", scales=list(draw=T), colorkey=F, main="India")

# map of India with states coloured with an arbitrary fake data
ind1$NAME_1 = as.factor(ind1$NAME_1)
ind1$fake.data = runif(length(ind1$NAME_1))
spplot(ind1,"NAME_1",  col.regions=rgb(0,ind1$fake.data,0), colorkey=T, main="Indian States")


# list of states avaialable
ind1$NAME_1

# map of West Bengal ( or any other state )
wb1 = (ind1[ind1$NAME_1=="West Bengal",])
spplot(wb1,"NAME_1", col.regions=rgb(0,0,1), main = "West Bengal, India",scales=list(draw=T), colorkey =F)
# map of Karnataka ( or any other state )
kt1 = (ind1[ind1$NAME_1=="Karnataka",])
spplot(kt1,"NAME_1", col.regions=rgb(0,1,0), main = "Karnataka, India",scales=list(draw=T), colorkey =F)

# --------------------------------------------------------------------------------------
# load level 2 india data downloaded from http://gadm.org/country
load("IND_adm2.RData")
ind2 = gadm

# plotting districts of a State, in this case West Bengal
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
spplot(wb2,"NAME_1", main = "West Bengal Districts", colorkey =F)

# colouring the districts with some simulated, fake data
wb2$NAME_2 = as.factor(wb2$NAME_2)
wb2$fake.data = runif(length(wb2$NAME_1)) 
spplot(wb2,"NAME_2",  col.regions=rgb(0,wb2$fake.data, 0), colorkey=T)

# colouring the districts with rainbow of colours
# wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col, colorkey=T)

# colouring the districts with range of colours
col_no = as.factor(as.numeric(cut(wb2$fake.data, c(0,0.2,0.4,0.6,0.8,1))))
levels(col_no) = c("<20%", "20-40%", "40-60%","60-80%", ">80%")
wb2$col_no = col_no
myPalette = brewer.pal(5,"Greens")
spplot(wb2, "col_no", col=grey(.9), col.regions=myPalette, main="District Wise Data")

# --------------------------------------------------------------------------------------
# load level 3 india data downloaded from http://gadm.org/country
load("IND_adm3.RData")
ind3 = gadm

# extracting data for West Bengal
wb3 = (ind3[ind3$NAME_1=="West Bengal",])

#plotting districts and sub-divisions / taluk
wb3$NAME_3 = as.factor(wb3$NAME_3)
col = rainbow(length(levels(wb3$NAME_3)))
spplot(wb3,"NAME_3", main = "Taluk, District - West Bengal", colorkey=T,col.regions=col,scales=list(draw=T))

# get list of districts avaialable
wb3$NAME_2

# get map for "North 24 Parganas District"
wb3 = (ind3[ind3$NAME_1=="West Bengal",])
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
spplot(n24pgns3,"NAME_3", colorkey =F, scales=list(draw=T), main = "24 Pgns (N) West Bengal")

n24pgns3$NAME_3 = as.factor(n24pgns3$NAME_3)
n24pgns3$fake.data = runif(length(n24pgns3$NAME_3))
spplot(n24pgns3,"NAME_3", col.regions=rgb(0, n24pgns3$fake.data, 0), colorkey=T,scales=list(draw=T))

# get map for "South 24 Parganas District"
s24pgns3 = (wb3[wb3$NAME_2=="South 24 Parganas",])
spplot(s24pgns3,"NAME_3", colorkey =F, scales=list(draw=T), main = "24 Pgns (S) West Bengal")

s24pgns3$NAME_3 = as.factor(s24pgns3$NAME_3)
s24pgns3$fake.data = runif(length(s24pgns3$NAME_3))
spplot(s24pgns3,"NAME_3", col.regions=rgb(0, s24pgns3$fake.data, 0), colorkey=T,scales=list(draw=T),main = "24 Pgns (S) West Bengal")


# get map for "Murshidabad District"
mur3 = (wb3[wb3$NAME_2=="Murshidabad",])
spplot(mur3,"NAME_3", colorkey =F, scales=list(draw=T), main = "Murshidabad West Bengal")

mur3$NAME_3 = as.factor(mur3$NAME_3)
mur3$fake.data = runif(length(mur3$NAME_3))
spplot(mur3,"NAME_3", col.regions=rgb(0,0, mur3$fake.data), colorkey=T,scales=list(draw=T),main = "Murshidabad West Bengal")

# now draw the map of Basirhat subdivision
# recreate North 24 Parganas data
n24pgns3 = (wb3[wb3$NAME_2=="North 24 Parganas",])
basirhat3 = (n24pgns3[n24pgns3$NAME_3=="Basirhat",])
spplot(basirhat3,"NAME_3", colorkey =F, scales=list(draw=T), main = "Basirhat,24 Pgns (N) West Bengal")

# now draw the map of Baharampur subdivision
# recreate Murshidabad data
mur3 = (wb3[wb3$NAME_2=="Murshidabad",])
bahar3 = (mur3[mur3$NAME_3=="Baharampur",])
spplot(bahar3,"NAME_3", colorkey =F, scales=list(draw=T), main = "Baharampur, Murshidabad, West Bengal")

# -------------------------------------------------------------------------------------------
# load level 2 india data downloaded from http://gadm.org/country
load("IND_adm2.RData")
ind2 = gadm

# plotting selected districts of a State, in this case West Bengal
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
spplot(wb2,"NAME_1", main = "West Bengal Districts", scales=list(draw=T),ylim=c(23.5,25),colorkey =F)

# zoomed in data
wb2 = (ind2[ind2$NAME_1=="West Bengal",])
wb2$NAME_2 = as.factor(wb2$NAME_2)
col = rainbow(length(levels(wb2$NAME_2)))
spplot(wb2,"NAME_2",  col.regions=col,scales=list(draw=T),ylim=c(23.5,25),xlim=c(87,89), colorkey=T)


PostScript

While we have achieved much, what was missing was the ability to mark cities on the map and write the names next to the points marked. To do so, we require 
  1. a function for geocoding place names into lon, lat values
  2. usage of the sp.layout option to place markers and texts on the map.
This has now been done, and you can see the three towns in West Bengal marked as follows


# Marking towns on the GADM maps
# On a map of West Bengal, we will now mark some towns
#
#load level 2 india data downloaded from http://gadm.org/country
# 

library(ggmap) # -- for geocoding, obtaining city locations
load("IND_adm2.RData")
ind2 = gadm

# plotting districts of a State, in this case West Bengal
wb2 = (ind2[ind2$NAME_1=="West Bengal",])

nam = c("Purulia","Bankura","Midnapur")
pos = geocode(nam)
tlat = pos$lat+0.05    # -- the city name will be above the marker
cities = data.frame(nam, pos$lon,pos$lat,tlat)
names(cities)[2] = "lon"
names(cities)[3] = "lat"


text1 = list("panel.text", cities$lon, cities$tlat, cities$nam,col="red", cex = 0.75)
mark1 = list("panel.points", cities$lon, cities$lat, col="blue")
text2 = list("panel.text",87.0,26.0,"GADM map", col = "dark green", cex = 1.2)
spplot(wb2, "NAME_1",
       sp.layout=list(text1,mark1, text2),
       main="West Bengal Districts",
       colorkey=FALSE, scales=list(draw=TRUE))



to change plot symbols look at http://www.statmethods.net/advgraphs/parameters.html