pacman::p_load(sf, tmap, tidyverse, plotly, spatstat, stars)This is a continuation to my previous post on the topic. Since we have a good amount of data on our hands, I wanted to run some initial exploratory analysis using a combination of non-spatial and spatial techniques. This post will use relatively basic techniques and will only be looking at one layer of spatial data– the cafes in Metro Manila and their locations.
Objectives
For this post, we will have the following objectives:
Apply non-spatial techniques to analyze and describe the Metro Manila cafe data extracted using the Google Places API;
Apply spatial techniques to analyze the same data visually
We will use the appropriate packages in R in order to perform the different analyses (spatial and otherwise) to support our answers to the above questions. The same analyses should also be possible in Python, but I have chosen R as I am more familiar with it, and also, since these posts are produced in R Studio, I am hoping that I encounter less issues when running code blocks with R.
Data Sources - Our MM Cafe Database
The data we are using comes from two main sources:
- GoogleMaps - Using the Places API’s Nearby Search, we compiled a database of cafes in Metro Manila by performing an API call at regular spaced intervals in the (padded) boundary of the region.
- Humanitarian Data Exchange (HDX) - HDX is an open data platform managed by UN OCHA. (Office for the Coordination of Humanitarian Affairs) From this platform, we sourced geospatial datasets for the administrative boundaries in the Philippines. This has been used in the construction of the current database, but will also be used for the analysis that we will be performing later as we perform our analyses.
You may refer to my previous post to see how we built the database in more detail. The data we will be using is based on an updated run a month after the post last December 2025. This also applies some of the modifications mentioned in the latter half of the post, which increased the number of grid points, or search points, and ultimately the number of cafes returned.
Loading R Packages
We load the following packages into our environment using p_load() of the pacman package. We use pacman so that any packages that are not installed will be automatically installed. If all packages are already installed, then the code block is equivalent to a call of library() to load the packages.
The loaded packages include:
sf - package for importing, managing and processing vector-based geospatial data
tidyverse - collection of packages for performing data importation, wrangling and visualization
tmap - package with functions for plotting cartographic quality maps
plotly - package for the creation of interactive data visualizations
spatstat - a package for point pattern analysis
stars - used for working with raster and vector data cubes
Data Loading and Preparation
Dataset 1. Cafe Database (Raw)
We load the current cafe database into an R dataframe mm_cafe_raw using read_csv().
mm_cafe_raw <- read_csv("data/aspatial/metro_manila_cafes_master.csv")Rows: 12024 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): place_id, name, address, data_extracted, ADM2_EN, ADM3_EN, ADM4_EN,...
dbl (2): lat, lng
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The output displays that 12,024 records were returned across 11 columns. The 11 columns are listed, and it is shown that 9 of them are character/strings while 2 of them are numeric. We know that the dataset includes the output from both of the runs, and the data_extracted column gives the date when they were extracted. We can show these dates using distinct() which comes with tidyverse.
distinct(mm_cafe_raw, data_extracted)# A tibble: 2 × 1
data_extracted
<chr>
1 2/12/2025
2 5/1/2026
The output shows that there are two values– one for 2 Dec 2025, and another for 5 Jan 2026. This means that there would be duplicates across the data as most of the cafes from the previous run should also be there in the latest one. In order to make sure we do not miss any data, we will just remove any duplicate places. For this post, we will make sure that coordinates need to be unique since we do not want any overlap in locations at the moment. We use the code block below to remove duplicated values for lat and lng. (the latitude and the longitude) We also run glimpse() in the same code block to give us the number of records, as well as the first few entries for each column.
mm_cafe_raw <- mm_cafe_raw %>%
distinct(lat, lng, .keep_all = TRUE, .fromLast = TRUE)
glimpse(mm_cafe_raw)Rows: 11,812
Columns: 12
$ place_id <chr> "ChIJoaqhLu-zlzMRCw_dDwh7HWo", "ChIJGxMZJACzlzMRvayy0yT…
$ name <chr> "Lazy Latte Cafe", "NG SNACK CAFE", "But First, Coffee …
$ address <chr> "62, HP Building, G. Lazaro Rd, Valenzuela, 1443 Metro …
$ lat <dbl> 14.70791, 14.71314, 14.70852, 14.71520, 14.70869, 14.70…
$ lng <dbl> 120.9558, 120.9505, 120.9466, 120.9522, 120.9524, 120.9…
$ data_extracted <chr> "2/12/2025", "2/12/2025", "2/12/2025", "2/12/2025", "2/…
$ ADM2_EN <chr> "Metropolitan Manila Third District", "Metropolitan Man…
$ ADM3_EN <chr> "City of Valenzuela", "City of Valenzuela", "City of Va…
$ ADM4_EN <chr> "Dalandanan", "Pasolo", "Poblacion", "Malanday", "Pasol…
$ Group <chr> "TBC", "TBC", "But First, Coffee", "TBC", "TBC", "TBC",…
$ Include <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",…
$ .fromLast <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, T…
We see that we are still left with a large number– 11,812 cafes, which we will use for our analyses. While we have already finalized the records to use, we see that we can clean up and remove and rename the columns to make analyses easier:
Remove columns that give no value for planned analyses:
place_id,address,data_extracted,ADM2_EN,IncludeRename columns:
ADM3_ENtoCity,ADM4_ENtoBarangay
The first code block below executes these two operations using a pipeline with select() and rename(). The second code block uses head() to display the first rows of the resulting dataframe.
mm_cafe_raw <- mm_cafe_raw %>%
select(
-place_id,
-address,
-data_extracted,
-ADM2_EN,
-Include
) %>%
rename(
City = ADM3_EN,
Barangay = ADM4_EN
)head(mm_cafe_raw)# A tibble: 6 × 7
name lat lng City Barangay Group .fromLast
<chr> <dbl> <dbl> <chr> <chr> <chr> <lgl>
1 Lazy Latte Cafe 14.7 121. City… Dalanda… TBC TRUE
2 NG SNACK CAFE 14.7 121. City… Pasolo TBC TRUE
3 But First, Coffee (BFC) - Polo Val… 14.7 121. City… Poblaci… But … TRUE
4 R2RO’s Café 14.7 121. City… Malanday TBC TRUE
5 Kubo ni lola 14.7 121. City… Pasolo TBC TRUE
6 Yolk Manila 14.7 121. City… Mabolo TBC TRUE
Dataset 2. Metro Manila Boundaries
The next step is to prepare the dataset that would contain the information on the boundaries of cities and barangays (towns) within Metro Manila. The dataset we have is for the whole of Philippines, so we will first load that into an sf dataframe using st_read() from the sf package.
phil_sf <- st_read(dsn="data/geospatial",
layer="phl_admbnda_adm4_psa_namria_20231106")Reading layer `phl_admbnda_adm4_psa_namria_20231106' from data source
`C:\drkrodriguez\datawithderek\posts\mm-cafes-eda-1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 42048 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 114.2779 ymin: 4.587294 xmax: 126.605 ymax: 21.12189
Geodetic CRS: WGS 84
The output shows that the dataset includes 42,048 features or records, which should correspond to that many barangays across the whole country. The output also shows that there are 17 fields in the dataframe. We can again use glimpse() to give us the first few elements
glimpse(phil_sf)Rows: 42,048
Columns: 18
$ ADM4_EN <chr> "Adams (Pob.)", "Bani", "Buyon", "Cabaruan", "Cabulalaan", …
$ ADM4_PCODE <chr> "PH0102801001", "PH0102802001", "PH0102802002", "PH01028020…
$ ADM4_REF <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM3_EN <chr> "Adams", "Bacarra", "Bacarra", "Bacarra", "Bacarra", "Bacar…
$ ADM3_PCODE <chr> "PH0102801", "PH0102802", "PH0102802", "PH0102802", "PH0102…
$ ADM2_EN <chr> "Ilocos Norte", "Ilocos Norte", "Ilocos Norte", "Ilocos Nor…
$ ADM2_PCODE <chr> "PH01028", "PH01028", "PH01028", "PH01028", "PH01028", "PH0…
$ ADM1_EN <chr> "Region I (Ilocos Region)", "Region I (Ilocos Region)", "Re…
$ ADM1_PCODE <chr> "PH01", "PH01", "PH01", "PH01", "PH01", "PH01", "PH01", "PH…
$ ADM0_EN <chr> "Philippines (the)", "Philippines (the)", "Philippines (the…
$ ADM0_PCODE <chr> "PH", "PH", "PH", "PH", "PH", "PH", "PH", "PH", "PH", "PH",…
$ date <date> 2022-11-09, 2022-11-09, 2022-11-09, 2022-11-09, 2022-11-09…
$ validOn <date> 2023-11-06, 2023-11-06, 2023-11-06, 2023-11-06, 2023-11-06…
$ validTo <date> -001-11-30, -001-11-30, -001-11-30, -001-11-30, -001-11-30…
$ Shape_Leng <dbl> 0.42360392, 0.05489374, 0.08418485, 0.07183253, 0.04147813,…
$ Shape_Area <dbl> 9.506039e-03, 1.503539e-04, 3.307794e-04, 2.550465e-04, 8.6…
$ AREA_SQKM <dbl> 111.14302605, 1.75975683, 3.87215817, 2.98522045, 1.0175345…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((120.9207 18..., MULTIPOLYGON (…
We are only focusing on analyzing cafes in Metro Manila in this post, so we can just work with records that correspond to the region. This should be contained in the column ADM1_EN, and the following code gives the unique values (Region names) in the dataset.
unique(phil_sf$ADM1_EN) [1] "Region I (Ilocos Region)"
[2] "Region II (Cagayan Valley)"
[3] "Region III (Central Luzon)"
[4] "Region IV-A (Calabarzon)"
[5] "Region V (Bicol Region)"
[6] "Region VI (Western Visayas)"
[7] "Region VII (Central Visayas)"
[8] "Region VIII (Eastern Visayas)"
[9] "Region IX (Zamboanga Peninsula)"
[10] "Region X (Northern Mindanao)"
[11] "Region XI (Davao Region)"
[12] "Region XII (Soccsksargen)"
[13] "National Capital Region (NCR)"
[14] "Cordillera Administrative Region (CAR)"
[15] "Region XIII (Caraga)"
[16] "Mimaropa Region"
[17] "Bangsamoro Autonomous Region In Muslim Mindanao (BARMM)"
There are 17 region names appearing and Metro Manila should be indicated by the 13th name on the list: National Capital Region (NCR)
We can now create the sf dataframe for Metro Manila (to be named mm_sf) using the following transformations:
- Filter
phil_sfonADMI_ENand only keep the rows where the value is “National Capital Region (NCR)” - Keep only required columns. Remove ADM columns except for barangay and city names and codes. Remove date columns. Remove dimension columns except AREA_SQKM. This means only keeping the following columns: ADM3_EN, ADM3_PCODE, ADM4_EN, ADM4_PCODE, AREA_SQKM, and the geometry.
- Rename city and barangay columns so they are more descriptive.
The following code block executes these changes, and then uses glimpse() to show a summary of the new dataframe.
mm_sf <- phil_sf[phil_sf$ADM1_EN == "National Capital Region (NCR)", ] |>
select(ADM4_EN, ADM4_PCODE, ADM3_EN, ADM3_PCODE, AREA_SQKM) |>
rename(
City = ADM3_EN,
CityCode = ADM3_PCODE,
Barangay = ADM4_EN,
BarangayCode = ADM4_PCODE
)
glimpse(mm_sf)Rows: 1,712
Columns: 6
$ Barangay <chr> "Barangay 1", "Barangay 2", "Barangay 3", "Barangay 4", "…
$ BarangayCode <chr> "PH1303901001", "PH1303901002", "PH1303901003", "PH130390…
$ City <chr> "City of Manila", "City of Manila", "City of Manila", "Ci…
$ CityCode <chr> "PH1303901", "PH1303901", "PH1303901", "PH1303901", "PH13…
$ AREA_SQKM <dbl> 0.049528343, 0.037871337, 0.038925569, 0.040178173, 0.024…
$ geometry <MULTIPOLYGON [°]> MULTIPOLYGON (((120.9661 14..., MULTIPOLYGON…
The new dataframe is shown to have 1712 rows across 6 columns– compared to the 42K rows across 18 columns from the full Philippine dataset.
Dataset 3. Transforming the Cafe Database into an sf dataframe
In order to make the two datasets work together, we will convert mm_cafe_raw into an sf dataframe that can work with mm_sf.
To perform the transformation, the following code block executes the following steps:
- Run the function
st_as_sf()onmm_cafe_rawto convert it into an sf dataframe. The parameters used indicate the columns with the coordinates (coords), the current CRS (reference system) used for the coordinates (crs), and whether the other columns are to be removed. (remove) st_transform()allows the transformation of one sf dataframe into a new CRS, whilest_crs()returns the CRS for an sf dataframe. The final line of code then makes sure thatmm_cafe_sfwill end up with the same CRS asmm_sf.
mm_cafe_sf <- mm_cafe_raw %>%
st_as_sf(
coords = c("lng", "lat"), # order is LONGITUDE, then LATITUDE
crs = 4326, # WGS84 (typical for lat/lng)
remove = FALSE
) |>
st_transform(st_crs(mm_sf)) # match CRS of mm_sfWe can then use the tmap package to visualize these two dataframes as layers in a map to see if they have been loaded and transformed properly. The code block below first converts any improperly encoded characters in the cafe name to prevent any errors, before generating an interactive thematic map using the two layers.
mm_cafe_sf$name <- iconv(mm_cafe_sf$name, from = "", to = "UTF-8", sub = "byte")
tmap_mode("view") # interactive modetmap mode set to interactive viewing
tm_shape(mm_sf) +
tm_polygons(
col = "white",
border.col = "grey40",
alpha = 0.4
) +
tm_shape(mm_cafe_sf) +
tm_dots(
col = "red",
size = 0.05,
popup.vars = c("Café Name" = "name") # hover/mouseover label
) +
tm_basemap("OpenStreetMap") +
tm_layout(
title = "Metro Manila Cafés",
frame = FALSE
)