Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions

Chao, Steven; Engstrom, Ryan; Mann, Michael; Bedada, Adane

doi:10.3390/rs13193962

Open AccessArticle

Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions

Department of Geography, The George Washington University, Washington, DC 20052, USA

^*

Author to whom correspondence should be addressed.

Remote Sens. 2021, 13(19), 3962; https://doi.org/10.3390/rs13193962

Submission received: 9 September 2021 / Revised: 28 September 2021 / Accepted: 29 September 2021 / Published: 3 October 2021

(This article belongs to the Special Issue Applications of Remote Sensing Imagery for Urban Areas)

Download

Browse Figures

Versions Notes

Abstract

:

With an increasing global population, accurate and timely population counts are essential for urban planning and disaster management. Previous research using contextual features, using mainly very-high-spatial-resolution imagery (<2 m spatial resolution) at subnational to city scales, has found strong correlations with population and poverty. Contextual features can be defined as the statistical quantification of edge patterns, pixel groups, gaps, textures, and the raw spectral signatures calculated over groups of pixels or neighborhoods. While they correlated with population and poverty, which components of the human-modified landscape were captured by the contextual features have not been investigated. Additionally, previous research has focused on more costly, less frequently acquired very-high-spatial-resolution imagery. Therefore, contextual features from both very-high-spatial-resolution imagery and lower-spatial-resolution Sentinel-2 (10 m pixels) imagery in Sri Lanka, Belize, and Accra, Ghana were calculated, and those outputs were correlated with OpenStreetMap building and road metrics. These relationships were compared to determine what components of the human-modified landscape the features capture, and how spatial resolution and location impact the predictive power of these relationships. The results suggest that contextual features can map urban attributes well, with out-of-sample R² values up to 93%. Moreover, the degradation of spatial resolution did not significantly reduce the results, and for some urban attributes, the results actually improved. Based on these results, the ability of the lower resolution Sentinel-2 data to predict the population density of the smallest census units available was then assessed. The findings indicate that Sentinel-2 contextual features explained up to 84% of the out-of-sample variation for population density.

Keywords:

machine learning; contextual features; population; urban attributes; modeling; spatial resolution

Graphical Abstract

1. Introduction

The world population is projected to reach 9.8 billion by 2050, with most of the growth in developing countries and with 68% predicted to live in urban areas [1]. Current and accurate population data are important for managing time-sensitive issues such as vulnerable populations identification, disease impact, natural disaster or emergency response, management, and evacuations [2,3,4,5,6,7,8]; administrative and legislative issues such as resource and service allocation, policymaking, planning, and boundary delineation [9,10,11,12,13]; private and social research such as selecting sites for businesses or assessing health care accessibility [13,14,15,16]; and assessing environmental impacts [17,18].

Census data, while useful, has a number of limitations: (1) countries usually conduct censuses at most once every 10 years, as recommended by the United Nations [13], which affects their relevance, as high migration and urban growth rates can make existing data quickly outdated [19]; (2) due to privacy reasons, census data are usually aggregated, generalized, and not available at the local scale [20,21,22]; (3) census units do not necessarily align with human settlement boundaries [20,21,22,23]; (4) censuses are resource intensive, which make them challenging to carry out, especially in resource-limited countries [24]; (5) census data can also be inaccurate or incomplete, omitting key groups or areas [3,24,25].

1.1. Population Estimation Techniques

Remote sensing technologies enable frequent data collection, making them effective and widely used in predicting population [5,26]. Wardrop et al., [19] and S.-S. Wu et al., [27] categorize population estimation approaches into two groups.

The first category is areal interpolation or a “top-down” approach, a technique where population data are reallocated within one areal unit, usually using some weighting mechanism [19,22,23,27]. This includes dasymetric mapping, which disaggregates census-derived population counts to spatial units smaller than the original boundaries by overlaying ancillary data [21,23,28]. Theoretically, these new areal units would group areas with similar population data [21]. Advantages of dasymetric mapping for population include the ability to depict the spatial differences in population counts within administrative boundaries [23], lower risk of succumbing to the modifiable areal unit problem [20,23], and increased accuracy [29]. Nagle et al., [30], however, cautioned that dasymetric mapping always introduces uncertainty because not all factors are accounted for in the deterministic relations. Moreover, because this depends on census data as inputs, the results rely on census data being accurate and at the appropriate scale [19,31].

The other category is statistical modeling or a “bottom-up” approach, which relies on socioeconomic variables and their relationship with population [27]. Wardrop et al., [19] defines this as the use of microcensus surveys, which are population counts for small, defined areas that are incorporated into statistical models to estimate population at out-of-sample locations. Statistical modeling can overcome the limitations associated with censuses, including the lack of data [19,27,32,33]. Another benefit is that statistical models for estimating population density and population can be used interchangeably [27]. Population density modeling methods can utilize a remote sensing input (Lo, as cited in G. Li and Weng [34]), such as urban areas [26,35,36,37,38], land use [23,28], socioeconomic variables [20,31] including dwelling units [29,39,40,41], raw spectral values [28,34,37,42] including nighttime lights [43,44,45,46], or a combination of these inputs [22,32,47,48,49,50].

1.2. Spatial Features and Remote Sensing

Texture analysis and spatial feature extraction can identify patterns and homogeneity in spatial configurations that go beyond spectral patterns or color intensities. Approaches include Gabor, gray-level co-occurrence matrix (GLCM) method, histogram of oriented gradients (HOG), local binary pattern (LBPM), and local edge pattern (LEP) [51,52].

In remote sensing, spatial feature extraction has primarily been used for land use and land cover (LULC) classification [51]. It claims to improve classification accuracy when compared to per-pixel classification, especially for very-high-spatial-resolution (VHSR; less than 2 m spatial resolution) imagery, due to greater pixel variance [53].

The traditional, most popular approach for LULC classification with spatial features is GLCM [51,54]. Bayram et al., [51], however, found that GLCM is not particularly accurate, along with LEP and edge-oriented features. Abeigne Ella et al., [54] tried to identify the texture feature extraction methods that produce the best results for urban settlement classification, specifically focusing on informal settlements. The authors found that LBPM was better than GLCM as it was more consistently accurate and not significantly affected by the number of sampling points.

Zhang et al., [55] found that multiple texture features were necessary to accurately classify LULC. They recommended using at least three or four, and beyond that threshold, the classification accuracy improvement became negligible. For homogeneous areas, the number of texture features needed was less than for spatially varying areas. They found that using the mean derived from 10 m SPOT (Satellite Pour l’Observation de la Terre) panchromatic imagery of Beijing produced the best results when only one texture feature is used to capture urban spatial patterns, while the combination of mean and GLCM produced the best results when two texture features are used.

Graesser et al., [56] introduced the technique of using various pixel windows over which to calculate spatial features and then reporting the statistical values (e.g., mean, minimum, maximum) back to neighborhoods of pixels instead of looking at individual pixels. As the scales of human settlements vary spatially, this technique captures a variety of patterns for an area while maintaining a pixel’s neighborhood context, and hence can be considered a contextual features approach. For instance, Engstrom et al., [57] and Graesser et al., [56] used contextual features to map slums or deprived areas in multiple cities. This tends to work well with VHSR imagery in urban areas because the objects of interest are generally made up of multiple pixels. Moreover, this approach reduces the amount of class outliers and computational processing needed.

While contextual features have been used in classification, other research has examined if the features themselves are directly correlated to measures of building density, road density, building count, population, and poverty. Engstrom et al., [58] calculated contextual features from VHSR imagery and correlated them with urban attributes (building area, building count, building density, built-up area, built-up percent, road area, road length, and road density) from OpenStreetMap (OSM). Their model explained 70% to 92% of the variance in urban attributes. Engstrom, Hersh, and Newhouse [59] and Engstrom, Newhouse, et al., [60] found that contextual features calculated from VHSR imagery were also highly correlated to poverty measures. Engstrom, Newhouse, et al., [60] found that with an ordinary least squares (OLS) linear regression model, spatial features explained up to 54% of the variance in poverty within Sri Lanka. This underscores the ability of spatial features to detect human modifications to the landscape.

Many researchers have emphasized the importance of accurate and timely population counts for various purposes such as disaster management [26,28,34,35,38,40,41,42,47,52,61]. To improve population models, these researchers use either VHSR satellite imagery [29,32,35,41,47,52,58] or medium-spatial-resolution imagery such as from Landsat [23,26,28,34,35,37,38,40,42,47,50,62]. While VHSR imagery may be more accurate than lower-spatial-resolution imagery [62], it is not always accessible as it is not free. Similar to the challenges of conducting a census [24], this can introduce cost barriers for resource-limited countries. Given that Landsat has been effective at modeling demographic variables, the freely available Sentinel-2 imagery with its 10 m spatial resolution can also be beneficial and possibly more accurate. While coarser than VHSR imagery, the every-five-day global coverage of Sentinel-2 and its availability in Google Earth Engine for global processing makes this a powerful resource for working over large areas [63].

The bottom-up methodology of using contextual features in remote sensing fits into the trend of using imagery-derived textures to model population [32]. The methodologies discussed in Engstrom, Newhouse, et al., [60] and Engstrom et al., [57] are context-dependent, where the correlated variables in one area of a country were not necessarily the same in another. To date, there has not been any work identifying how correlations compare across countries and sub-regions or if there is a global indicator for estimating these attributes.

This paper examines how contextual features are related to population density and human modification to the landscape (hereinafter synonymous with urban attributes). We define contextual features as the statistical quantification of edge patterns, pixel groups, gaps, textures, and the raw spectral signatures calculated over groups of pixels or neighborhoods. Given the ability of texture analysis to identify human presence on satellite imagery and the demonstrated relationship between remotely sensed variables and population density, it was hypothesized that contextual features will be strongly correlated with human settlements and population density. We specifically address the following questions:

What do contextual features derived from VHSR imagery represent in the human-modified landscape?
How do these representations of the landscape change as the spatial resolution of the satellite imagery changes (from VHSR imagery to Sentinel-2 imagery)?
How do contextual features derived from Sentinel-2 relate to population density based on census data?
To what extent can a population density model be built based on contextual features to allow for the dasymetric mapping of population density in multiple countries?

2. Materials and Methods

To answer the research objectives, contextual features were analyzed in relation to urban attributes and population density (Table 1). The methodology used largely followed that of Engstrom et al., [58] (Figure 1).

2.1. Study Areas

The study areas for this research include portions of Belize, Sri Lanka, and the city of Accra, Ghana. These three locations comprise a range of cities, urban and rural populations, and land cover characteristics in three different regions—Latin America and the Caribbean, South Asia, and Sub-Saharan Africa—spanning three different continents (Figure 2).

For each area, we have access to fine spatial resolution census data and urban attributes from OSM. Polygon shapefiles delineating the Enumeration Areas (EA) in Ghana were provided by Ghana Statistical Service (GSS) [64], Enumeration Districts (ED) in Belize were provided by the Statistical Institute of Belize (SIB) [65], and Gram Niladhari Divisions (GN) in Sri Lanka were provided by the Department of Census and Statistics (DCS) [66]; these enumeration units are the census units used in our analysis. There are 2403 EAs in Accra, 723 EDs in Belize, and 14,021 GNs in Sri Lanka (Table 2). Shapefiles of national-level administrative boundaries (level 0) were also obtained [67,68].

2.1.1. Accra, Ghana

The Greater Accra Metropolitan Area used in this study includes the Accra Metropolitan Assembly (AMA), La Dade-Kotopon Municipal Assembly (formerly under the AMA until 2012 [69]), and Ledzokuku-Krowor Municipal Assembly (LEKMA). The population as of 2010 in the AMA was 1,665,086 [70]; La Dade-Kotopon, 183,528 [69]; and LEKMA, 227,932 [71]. In the AMA, the 2010 census counted 149,689 houses, with an average household size of 3.7 people [70]; in La Dade-Kotopon, 19,174 houses, 3.6 people [69]; and in LEKMA, 21,366 houses, 3.6 people [71]. Most lived in compound houses [69,70,71]. As of 2010, Ghana is projected to experience a two-fold increase in population by 2038 [72]. Accra has been experiencing rapid urbanization [71,73] and population growth due to natural increase and rural-urban migration [71,74,75,76], creating socio-economic, health, environmental, and institutional challenges [69,70,71,73,77,78].

2.1.2. Belize

In Belize, the 2010 census counted 323,236 people [65]. The SIB [79] counted nearly 80,000 households in Belize in 2010 with an average household size of 4.1 people, of which 39,162 households were in urban areas. The SIB [80] found that most households resided in undivided private houses. Living conditions in Belize are unique in that cities and towns have relatively low densities of residential development due to the availability of land, small city sizes, and ownership of large properties [81,82]. Belize is thus still a rural country, with 54% of its citizens living in rural areas [83]. The country, however, has seen its population double since 1980, mostly via immigration [84] and a relatively new trend of lifestyle migration [85].

2.1.3. Sri Lanka

There are four subnational administrative levels in Sri Lanka, with nine provinces, 25 Districts, 332 Divisional Secretariats, and 14,022 GNs [86]. The GN is the unit of study in this analysis. Sri Lanka’s last country-wide census was carried out in 2011 and 2012, the first since 1981 [87]. The 2012 census counted 20,359,054 people [66]. In 2012, there were nearly 6 million building units, with 5.2 million occupied housing units and 685,000 unoccupied housing units; the average household size was 3.8 people [87]. The proportion of Sri Lanka’s population living in urban areas has remained close to 18.5% due to an emphasis on rural development programs [83,88]. These official statistics may not reflect that some communities have urban characteristics but are officially classified as rural [88,89]. Sri Lanka faces numerous urban land management challenges including increasing population density, urban sprawl, rapid urban expansion, and pressure on the country’s road infrastructure [88,89].

2.2. Data Acquisition

2.2.1. Multispectral Satellite Imagery

Limited VHSR imagery was provided for parts of Sri Lanka: Colombo, Kurunegala, Negombo, and Batticaloa. WorldView-2 imagery resampled to 2-m spatial resolution for Colombo (1 January 2010) and Kurunegala (30 January 2012) were used [90,91]. GeoEye-1 imagery for Negombo (14 February 2010) and Batticaloa (16 September 2010) was also used, resampled to 1.6-m spatial resolution [92]. The VHSR imagery covered a total area of approximately 670 km².

Sentinel-2 image mosaics were created in Google Earth Engine. A cloud-free image for each country or city was extracted using the median pixel in the four 10 m bands: blue, green, red, and NIR (near-infrared). For Belize and Sri Lanka, imagery from Sentinel-2 A and B satellites between 1 January 2017, and 31 March 2018, was obtained to create the single image composites [93]. For Accra, imagery covering an area of approximately 1250 km² from 1 January 2019, to 1 January 2020, was used [94].

2.2.2. Urban Attributes

To capture the human-modified landscape, building footprint polygons and road polylines were downloaded from OSM [95,96] via GeoFabrik (https://www.geofabrik.de accessed on 30 July 2019). OSM is an open-source database of physical and man-made features digitized on a base map and continually updated [97]. OSM data for Sri Lanka were downloaded on May 21, 2019 [95]; for Belize, 20 August 2019 [95]; and for Accra, 25 June 2020 [96].

2.2.3. Population

Census data were provided in table format by GSS [64] for Accra in 2010, SIB [65] for Belize in 2010, and DCS [66] for Sri Lanka in 2012. These data were joined to the shapefiles for Accra at the EA level with 2318 records; Belize at the ED level, 775 records; and Sri Lanka at the GN level, 14,001 records.

2.3. Data Processing

2.3.1. Contextual Features

The contextual features are calculated by comparing each pixel or group of pixels (block) with its surrounding pixels (scale; Figure 3). The block size is also the pixel size to which the contextual feature statistics are reported [58]. Multiple scale sizes were chosen because the extent and variability of neighborhoods vary [56]. For VHSR imagery, all contextual features were created at a block size of 8 pixels and scales of 8 m, 16 m, 32 m, and 64 m, as in Engstrom et al., [58]. For Sentinel-2 imagery, most were created using a block size of 1 pixel (10 m) and scales of 30 m, 50 m, and 70 m. A few of the features—Fourier, line support regions (LSR), oriented FAST and rotated BRIEF (ORB), and structural feature sets (SFS)—used larger scales of 310 m, 510 m, and 710 m to effectively contextualize the landscape.

The 11 contextual features calculated are a combination of features that capture edge patterns, pixel groups, gaps, textures, and the raw spectral signatures. These features are Fourier, Gabor, HOG, lacunarity, LSR, LBPM, mean, normalized difference vegetation index (NDVI), ORB, PanTex, and SFS. SpFeas (https://github.com/jgrss/spfeas accessed on 30 July 2019), an open-source Python library developed by Graesser [98], was used to process the imagery (in the blue, green, red, and NIR bands).

Fourier Transform. Fourier transform captures the frequency of patterns across an image. Any signal can be represented as a series of sinusoidal signals [99,100]; thus, an image can be decomposed into sine and cosine waves with various amplitudes and frequencies [101]. The Fourier transform consists of magnitude and phase parts, with the former usually displayed as the output image (power spectrum). In these magnitude outputs, low-frequency features, such as water, are located closer towards the origin (center), with increasing frequency farther from the origin [99]. A radial profile can be derived from a power spectrum, within which pixel frequencies can be summarized. Fourier produces two outputs: mean and variance.

Gabor. Gabor is a linear filter used for edge detection [51]. Multiple filters consisting of strips are created by a sinusoidally modulated Gaussian function [102,103,104], forming the filter bank [105,106]. The size, shape, and orientation of the filters can be set, and the various orientations enable extraction of features with those associated orientations [102,104]. A Gabor wavelet transformation is outputted [107]. There are 16 Gabor outputs: mean, variance, and 14 individual filters that examine different angles.

Histogram of Oriented Gradients. HOG identifies the orientation and magnitude of shades [108], distinguishing settlement and non-settlement classes [56]. Gradient magnitudes in both the x and y directions are calculated for each pixel and combined to obtain the magnitude and direction of the gradient [109]. The image is divided into subregions (cells), and within each, the gradient direction bins the pixels by angles (1°–180°) [108,109,110]. The magnitude of each pixel is distributed to its associated bin, with the magnitude value split among two bins if the gradient direction falls between two. The aggregated magnitudes in each bin form a histogram (vector) for the cell [108,111].

Next, four cells (and their four histograms) are concatenated into a block and normalized [109,110]. All block vectors are combined to form the final HOG vector [108], and statistics can be extracted. The five statistical outputs are the maximum, mean, variance, skew, and kurtosis.

Lacunarity. Lacunarity measures the homogeneity of the landscape via the spatial distribution of gap sizes. For heterogeneous images, all gap sizes are not the same; thus, the image is not translationally invariant, and lacunarity is high [112,113]. For instance, in urban areas, there are gaps between buildings; in high density areas, there tend to be less gaps [56]. Variation in gap sizes is scale dependent [112,113,114].

One way to calculate lacunarity involves a moving window in which the number of holes is calculated [113,115]. First, an intensity surface, where the plane is the image and the z-axis (height) is the intensity (value) of the pixels, is created. A moving window of a set size is centered over one pixel, with a smaller gliding box placed in the upper left corner. If necessary, multiple boxes are stacked so all the pixel intensities fall within. The relative height is calculated using the minimum and maximum pixel values (or the boxes in which they fall) within the column. As the gliding box moves across the image window, all the relative heights are summed, and a formula is used to calculate lacunarity for that center pixel. The window repeats the process across the image [116]. Only one lacunarity value is calculated.

Line Support Regions. LSR extracts straight lines from imagery, which can determine the area and spatial configuration of settled areas [56,117,118]. Gradient orientations on an image are first calculated and used to group pixels into LSRs with similar gradient orientations. The groups that do not have enough support (pixels appropriated to a region, as described in Burns et al., [117]) are removed. A plane fit to the pixel intensities in each line support region using a least squares fit and a horizontal plane of average pixel intensities, both weighted by local gradient magnitude, are created. A line is extracted where the two planes intersect. The line’s length, width, contrast (intensity change over the line), steepness (slope of intensity change), and straightness can subsequently be obtained [117]. LSR produces three outputs: line length, line mean, and line contrast.

Local Binary Pattern. LBPM assesses the homogeneity of an image, detecting bright and dark spots, flat areas, and edges [119]. After the radius and number of neighbors are specified, the value of a center pixel is compared with those of its surrounding neighbors. If the center pixel value is smaller or equal, the neighbor is given a value of 1; otherwise, the value is 0 [54,119,120,121,122]. The values around the center pixel are taken sequentially (forming a binary string) and inputted into an equation to obtain the LBPM code for the center pixel [119,121,123]. Patterns with more than two 0-1 or 1-0 switches are not uniform, with two or less considered uniform [119,120,122]. A histogram is built with separate bins for each uniform pattern and one bin for all non-uniform patterns [119,122]; this is based on Ojala et al.’s [119] observation that certain uniform patterns appear more frequently in textures. Five statistical outputs of LBPM are produced: maximum, mean, variance, skew, and kurtosis.

Mean. The mean of the image is calculated using inverse distance weighting (IDW). IDW is an interpolation method where the influence of a point on an unknown point is inversely related with distance and dependent on the specified power setting, which controls the rate at which the influence of points decreases with increasing distance [124]. For SpFeas, pixels near the center of a frame are given higher weights [98]. In addition to mean, the variance of the pixels within the scale used is also calculated.

Normalized Difference Vegetation Index. NDVI assesses vegetation by incorporating a pixel’s value in the NIR and red regions. High values (towards 1) reflect a higher density of green vegetation, and low values (towards -1) reflect a lower density [99]. NDVI values are generally lower in and negatively correlated with built-up areas due to sparser vegetation [125]. Both the mean and variance of NDVI are calculated for each scale.

ORB. A feature-based matching method introduced by Rublee et al., [126], ORB combines the Features from Accelerated Segment Test (FAST)—a feature detector—and Binary Robust Independent Elementary Features (BRIEF)—a feature descriptor—approaches.

The FAST algorithm is used to identify keypoints at each level in a scale pyramid of the image, and the Harris corner measure orders the keypoints and rejects edges picked up by FAST [126,127,128]. Intensity centroid is used to assign an orientation to the corner [126,129,130]. BRIEF selects a random pair of pixels around a keypoint, compares their intensity values, and assigns them binary values [126,131]. The orientation from the intensity centroid is used to steer BRIEF towards this orientation, as BRIEF is not invariant to rotation. A greedy algorithm takes all the pairs and creates a subset (usually 256) of uncorrelated pairs, forming a 256-bit feature descriptor output (rotated BRIEF or rBRIEF) [126,129,132]. Five statistical outputs from ORB are produced: maximum, mean, variance, skew, and kurtosis.

PanTex. PanTex extracts built-up areas from panchromatic imagery using the GLCM approach [133,134]. The textural contrast is calculated in all directions within a window around a pixel. The minimum value is taken, and the output with all the minimum values is the PanTex index. For urban areas, this minimum value would be consistently high. Pesaresi et al., [134] used minimum values over average values, reasoning that averages produce an edge effect that could overestimate built-up areas. PanTex produces one output, which is the minimum contrast.

Structural Feature Sets. SFS extracts information on direction-lines [135]. Lines from the center pixel are created in all directions. For a direction-line, a pixel is compared with the center pixel to determine whether it is considered homogenous. If it is, it is added to the direction line; the line keeps extending until a pixel is not considered homogenous based on set threshold levels or until the line reaches a set maximum length [135,136]. This is repeated for all line directions. A histogram is built from the lines, and statistics can be extracted [135]. SFS produces six outputs: maximum line length, minimum line length, mean, w-mean (weighted mean), standard deviation, and maximum ratio of orthogonal angles.

Finally, zonal statistics—mean, sum, and standard deviation—were calculated on each contextual feature output. For VHSR imagery of Sri Lanka, there were 576 contextual feature outputs in total; for Sentinel-2 imagery, there were 429 contextual feature outputs total for each census unit in all study areas. Of the 723 EDs and 14,021 GNs in Belize and Sri Lanka, respectively, 687 EDs and 13,402 GNs were completely covered with imagery and were used in the analysis.

2.3.2. Urban Attributes

All census units with complete and accurate road and building OSM data were identified by overlaying OSM data on top of satellite imagery [137] (Figure 4). For Accra, 314 EAs had complete OSM data; for Belize, 80 EDs; and for Sri Lanka, 333 GNs. Of the 333 GNs, 192 had VHSR imagery coverage (Colombo, Kurunegala, Batticaloa, and Negombo). In total, there were 727 census units used in this analysis.

The road and building shapefiles were clipped to each census unit. Within each unit, building area, building count, building density, built-up area, built-up percent, road area, road density, and road length were calculated in a fashion similar in Engstrom et al., [58]. Building area is the total area of building footprints in square meters. Building count is the number of buildings. Building density is the building count divided by the census unit area in square kilometers. Built-up area is the sum of road area and building area in square meters. Built-up percent is built-up area divided by census unit area. Road length is the aggregated length of all road segments in meters. Road density is road length in meters divided by census unit area in square kilometers.

Road area is the total area of all road segments in square meters. OSM road polylines were multiplied by estimated road widths based on their classifications and the traffic direction. Widths were determined using GIS based on satellite imagery and the OSM metadata guidance provided by Ramm [138] (Table 3).

2.3.3. Population Density

The census datasets were joined to their respective shapefiles. The area was calculated for each census unit, and the population was divided by the area to obtain the population density (people per km²) for each unit.

2.4. Data Preparation

The datasets were combined in accordance with the four main analyses (Table 1). When the population density and contextual feature datasets were combined, 2216 EAs, 687 EDs, and 13,402 GNs remained. To reduce the large number of independent variables, bivariate correlations were conducted between each independent variable and the dependent variable, which was similarly performed in Engstrom et al., [58] and Joseph et al., [42]. Pearson’s correlations, which characterize the strength and direction of a relationship, were calculated. The associated p-value for each correlation was obtained, and the 200 independent variables with the strongest correlation coefficients and p-values less than a significance level of 0.05 were kept. Finally, all variables were scaled and normalized.

2.5. Model Building

The processed data were split for each analysis: 80% for training and 20% for out-of-sample testing. For an individual study area (Accra, Belize, Sri Lanka), the 80%/20% split consisted of that individual study area’s dataset only; for the combined study area (Accra–Belize–Sri Lanka combined), the 80%/20% split was performed after combining all the individual study areas’ datasets. The 80% subsets were used to create elastic net regularization (ENR) and random forest models to predict urban attributes and population density. A model’s predictive power was assessed using the out-of-sample R-squared statistic (R²). This statistic was calculated—within each area for the individual study areas and across all areas for the combined study area—using the 20% of the data set aside for testing. Given the small sample sizes for some portions of the study, which can make models unstable, each analysis went through 100 trials, with random seed values set from 1 to 100. The output statistics from the 100 trials were averaged.

2.5.1. Elastic Net Regularized Regression

Developed by Zou and Hastie [139], ENR is a variable selection method that combines the least absolute shrinkage and selection operator (LASSO) and ridge regressions with ordinary least squares (OLS). Both have similarities with OLS [140]. The ridge regression applies a regularization term equal to the sum of squared coefficients—the

l_{2}

norm—which can shrink coefficients close to 0. The LASSO regression performs variable selection by applying a regularization term equal to the sum of absolute values of the coefficients—the

l_{1}

norm—which can remove independent variables by forcing their coefficients to 0 [140,141]. Each regularization term is multiplied by a tuning parameter

λ

, which together forms the shrinkage penalty (

l_{1}

penalty and

l_{2}

penalty). The tuning parameter

λ

controls the weight or extent of the penalties. When

λ

is large, the coefficients approach 0 in ridge regression and approach or equal 0 in LASSO regression, and ENR becomes a null model. When

λ

is small or equal to 0, the penalties are voided, and ENR becomes equal to OLS [140]. The mixing parameter

α

is set to control the ratio between ridge (

α = 0

) and LASSO (

α = 1

) [142].

ENR combines the advantages of LASSO and ridge regressions and is more accurate than solely using LASSO [139,141]. The ENR equation is written as [139,143]:

\hat{β} \equiv \arg \min_{β} ({‖ y - X β ‖}^{2} + λ_{2} {‖ β ‖}^{2} + λ_{1} {‖ β ‖}_{1})

(1)

where:

\begin{array}{r} {‖ β ‖}^{2} \\ {‖ β ‖}_{1} \end{array} \begin{array}{l} = \sum_{j = 1}^{p} β_{j}^{2} \\ = \sum_{j = 1}^{p} |β_{j}| . \end{array}

In (1),

\hat{β}

is the elastic net estimator,

y

is the dependent variable,

X

is an array of independent variables,

β

is a vector of estimated coefficients,

λ

is the tuning parameter,

{‖ β ‖}_{1}

is the

l_{1}

norm, and

{‖ β ‖}^{2}

is the

l_{2}

norm. As Equation (1) shows, ENR minimizes the residual sum of squares (RSS, which is used for OLS;

{‖ y - X β ‖}^{2}

) with the constraint of the added regularization terms (

{‖ β ‖}^{2}

and

{‖ β ‖}_{1}

). ENR gives a model of best fit by using the least number of independent variables to explain the dependent variables, improving on OLS and reducing multicollinearity (when independent variables are correlated with each other) [144,145].

Using ElasticNetCV from the scikit-learn library [142,146], select parameters were tuned via five-fold cross-validation to produce the best model (Table 4). Output variables from each trial were a list of features (independent variables) and their coefficients, out-of-sample R², out-of-sample mean square error (MSE), the l1 ratio, and alpha (

α

).

2.5.2. Random Forest Regression

Introduced by Breiman [147], a random forest is an ensemble modeling approach that consists of many decision trees. Graphically, decision trees are tree-like diagrams with numerous splits used to predict an output value given an input value [140].

To build a decision tree, the data are split into J leaf nodes or distinct regions—

R_{1}

,

R_{2}

, …,

R_{j}

—where each observation, with its known response value

y_{i}

, within a region is given the same prediction

{\hat{y}}_{R_{j}}

, which is the mean of the region’s training observation response values [140]. To split the regions R into J regions, each split is determined by minimizing the overall

R S S

between the separated groups, which is [140]:

R S S = \sum_{j = 1}^{J} \sum_{i \in R_{j}} {(y_{i} - {\hat{y}}_{R_{j}})}^{2} .

(2)

For random forests, each tree is chosen from a different sample, and each split at the node of a tree is determined by a random subset of independent variables [50,140,147]. Using a random subset prevents one strong predictor from overpowering other variables and creating similar decision trees, ensuring that predictions from the trees are not strongly correlated and the average of all trees is more reliable [140]. Random forests work well even when there are many predictors, including when some are co-related. Random forests are also nonparametric [50]. For regressions, the mean prediction of all the trees is outputted.

When building random forest models using RandomForestRegressor and GridSearchCV from the scikit-learn library [146,148,149], select parameters were tuned via five-fold cross-validation to produce the best model (Table 5 and Table 6). Output variables from each trial were a list of features and their importance values, out-of-sample R², and out-of-sample MSE.

3. Results

3.1. Human-Modified Landscape and Very-High-Spatial-Resolution Imagery Contextual Features

First, contextual features derived from VHSR imagery of Sri Lanka (sample size of 192 GNs) were used to model urban attributes to investigate what contextual features derived from VHSR imagery represent in the human-modified landscape. Across all models, ENR results indicated that VHSR contextual features explained 43% to 85% of the out-of-sample variation in urban attributes (Table 7). Random forest results indicated that VHSR contextual features explained 51% to 83% (Table 8).

3.2. Human-Modified Landscape and Imagery Spatial Resolution

Contextual features derived from Sentinel-2 imagery of Sri Lanka were used to model urban attributes and compared to the VHSR-derived contextual features to examine how these representations of the landscape change as the spatial resolution of the satellite imagery changes (for the same 192 GNs as in the VHSR imagery analysis). Across all models, ENR and random forest results indicated that Sentinel-2 contextual features explained 46% to 80% and 47% to 84% of the out-of-sample variance in urban attributes, respectively (Table 7 and Table 8).

3.3. Human-Modified Landscape and Sentinel-2 Imagery Contextual Features

To further investigate the ability of contextual features derived from Sentinel-2 imagery to map urban attributes, an analysis was run with data from all three study areas: Accra (314 EAs), Belize (80 EDs), and Sri Lanka (333 GNs). ENR and random forest models were built for each area individually (Table 9 and Table 10) and then on all areas (Table 11). ENR results indicated that Sentinel-2 contextual features explained up to 78% of the out-of-sample variance in urban attributes in Accra, 42% to 81% in Sri Lanka, and 34% to 90% in Accra–Belize–Sri Lanka. Random forest results indicated that contextual features explained 12% to 80% in Accra, 44% to 86% in Sri Lanka, and 45% to 93% in Accra–Belize–Sri Lanka.

3.4. Population Density and Sentinel-2 Contextual Features

Finally, contextual features derived from Sentinel-2 imagery of Accra (2216 EAs), Belize (687 EDs), and Sri Lanka (13,402 GNs) were used to model population density to explore how contextual features derived from Sentinel-2 relate to population density based on census data (Table 12). ENR results indicated that Sentinel-2 contextual features explained 57% of the out-of-sample variance in population density in Accra, 73% in Belize, 65% in Sri Lanka, and 67% in Accra–Belize–Sri Lanka. Random forest results indicated that contextual features explained 74% in Accra, 78% in Belize, 77% in Sri Lanka, and 84% in Accra–Belize–Sri Lanka.

4. Discussion

4.1. Human-Modified Landscape and Very-High-Spatial-Resolution Imagery Contextual Features

Random forest and ENR approaches had similar levels of performance, with four urban attributes having higher out-of-sample R² values with ENR and the other four having higher or equal values with random forest (Table 7 and Table 8). The lower out-of-sample R² values for two of the three building variables (building area and building count) suggest that contextual features are only able to modestly capture the building attributes. The relatively high out-of-sample R² values for the road attributes indicate that contextual features represent roads well. Since the built-up area attribute consisted of the building area and road area attributes, the lower out-of-sample R² value for the built-up area attribute was likely pulled down by the low out-of-sample R² value for building area. The out-of-sample R² value for the built-up percent attribute was strong likely due to the strong out-of-sample R² values for building density and road density.

4.2. Human-Modified Landscape and Imagery Spatial Resolution

Sentinel-2 is generally less powerful—yet still effective—at predicting urban attributes when compared to VHSR imagery. This is reflected in the out-of-sample R² decreasing when comparing the values from VHSR to those from Sentinel-2 (especially for building area, building density, and road density), although there were some increases (such as road length and road area; Table 13).

The results expand on the claim by Henebry and Kux [114] that lacunarity is scale dependent by reinforcing scale as an important component for all contextual features. When conducting classification or object identification, the homogeneity and the variance of pixel values change at differing spatial resolutions because different phenomena occur on different scales [150]. In addition to neighborhoods being on various scales [56], urban features within neighborhoods are also on various scales. This can explain why most out-of-sample R² values decreased while some did not (Table 13). The out-of-sample R² values may have increased for some road attributes because the sizes of the roads were generally smaller and closer to the spatial resolutions of Sentinel-2 and VHSR imagery, whereas the sizes of the buildings were much larger. The different sizes and compositions of buildings and roads could have then influenced their pixel variance at each spatial resolution.

4.3. Human-Modified Landscape and Sentinel-2 Imagery Contextual Features

Analysis suggests contextual features derived from Sentinel-2 imagery can effectively capture urban attributes, except for road density. Overall, random forest models were slightly more effective than ENR, indicating that the relationships may be non-linear (Table 9, Table 10, and Table 11).

The lower out-of-sample R² values for the building attributes (building area and building count) for individual study areas (especially Accra) suggest that Sentinel-2 contextual features can only somewhat capture building attributes in specific areas (Table 9 and Table 10). Although building area and building count generally had the weakest models with VHSR contextual features (Table 7 and Table 8), those attributes also mostly experienced larger drops in out-of-sample R² values compared to other attributes when degrading spatial resolution in Sri Lanka (Table 13); the lower spatial resolution of Sentinel-2 may have contributed to the lower predictive power.

Likewise, road density also had a larger drop in its out-of-sample R² value with moderate-resolution Sentinel-2 imagery (Table 13), which could explain why road density had lower out-of-sample R² values with Sentinel-2 imagery (Table 9, Table 10, and Table 11). The extremely low road density out-of-sample R² value for Accra (indicating that a null model was a better fit) was particularly surprising given the higher values for the other road attributes, which make up road density (Table 9). This may suggest a simple explanation that Accra’s road network might be more nuanced than was captured by OSM. The remaining out-of-sample R² values for the road attributes in Accra and Sri Lanka (Table 9 and Table 10), which were generally higher, suggest that contextual features can capture roads better than buildings in both areas even with moderate-resolution imagery.

Contextual features captured built-up variables the best. The out-of-sample R² values for built-up area and built-up density were strong; built-up percent had the highest out-of-sample R² values of all the urban attributes (Table 9, Table 10, and Table 11). One unexpected observation was that aggregated data (built-up percent and the Accra–Belize–Sri Lanka study area) generally had stronger out-of-sample R² values than their individual constituent datasets (building and road area and the individual study areas, respectively; Table 9, Table 10, and Table 11). For both aggregations, with more data, outliers within constituent areas may be less influential. In the combined study area specifically, contextual features appear capable of capturing the landscape in multiple areas better than in individual areas, possibly highlighting global landscape trends, given the larger area covered.

4.4. Population Density and Sentinel-2 Contextual Features

Sentinel-2 contextual features can generally predict population density well for individual countries and when all countries are combined, highlighting that a population density model with countries from various regions might be feasible (Table 12). Random forest models appeared to be more effective. Similar to the urban attribute results, one unexpected result was that the random forest out-of-sample R² value for the combined study area was higher than those of the individual areas. Outliers in individual areas may be less influential, possibly highlighting global population density trends.

The strong performances modeling urban attributes and population density are likely related. In previous work, researchers modeled population and population density using imagery-derived characteristics such as urban areas, land use, dwelling units, and raw spectral values (Lo, as cited in G. Li and Weng [34]). With contextual features capturing the landscape well, they are likely effective proxies for many of the variables that Lo (as cited in G. Li and Weng [34]) claim are important for modeling population. Within the context of this research, built-up attributes (which include building and road attributes) captured by contextual features could be a proxy for land use and urban areas, as spatial variations in building and road data can be representative of specific human landscapes; building attributes captured by contextual features could be a proxy for dwelling units by utilizing counts and average sizes of these buildings. Likewise, NDVI and mean both utilize raw spectral values and could be capturing open spaces or other indicators of population. Contextual features might partially explain population density by picking up various urban attributes that have been shown to model population well. There may be other factors unrelated to urban attributes that can model population; building counts, for instance, have been previously used to estimate population [29], yet contextual features did not capture building counts well. Overall, this method reflects a promising way of using open-source, freely available, remotely sensed data to model population, which can be especially helpful for government officials and researchers when costs are a concern.

4.5. Limitations and Future Work

One limitation is that the image collection dates were not the same as when the census and OSM variables were collected, as our study used and was limited to the best data that were available to us. While a limitation, the fact that the relationships were still strong indicates that the timing of data collection may not have that large of an influence on these relationships. Second, there is a large number of independent variables used within this study, and bivariate correlations were used to reduce the number of variables prior to their incorporation into the ENR and random forest models. While ENR and random forest models are designed to reduce issues resulting from multicollinearity, correlations among independent variables may have influenced the results presented in this study. A third limitation is the small sample size of some of the analyses relative to the number of predictors, which potentially resulted in low degrees of freedom and increased the risk of overfitting. With multicollinearity, this could cause the models to be unstable [140]. To mitigate this issue, we ran 100 trials and calculated out-of-sample statistics for each analysis [140]. While overfitting may still be the case for some of the analyses performed, the large sample size and strong out-of-sample results when using all of the datasets indicate that our results are more robust. Fourth, OSM data may have errors due to the nature in how and when the data are collected. While there were likely errors in the data, they were probably minor, as the data were visually verified with satellite imagery prior to analysis.

Future research should investigate whether multicollinearity tests or dimension reduction techniques such as principal component analysis should be performed to reduce any possible impacts of multicollinearity. Future work could also expand the models to include other countries, evaluate individual contextual feature performance, conduct time-series analysis with features once Sentinel-2 has acquired enough historical data, and test if different block and scale sizes for each of the features would work better, as these are scale dependent. Finally, while outside the scope of the current study, this analysis could theoretically be done at global scales and used to predict populations in areas where there is limited to no census data.

5. Conclusions

This study analyzed the ability of contextual features to model attributes of the human-modified landscape and population density. The results suggest that contextual features can model urban attributes well at very high spatial resolutions (<2 m), with out-of-sample R² values up to 85%, and less so—yet still effectively—at lower spatial resolutions (10 m), with out-of-sample R² values up to 93%. Contextual features can model population density well in individual and multiple countries, with out-of-sample R² values up to 84%, and the results here are very encouraging, as the data used in the study are freely available and global in coverage.

This research fits into the broader work of using contextual features to model socio-economic variables and using remote sensing to predict population. The strong results with freely available Sentinel-2 imagery have important implications for researchers and government officials, as those with limited resources can use contextual features to model population density at a specified time and place, allowing for accurate and timely population counts utilizing both top-down and bottom-up approaches when census data are outdated or unavailable.

Author Contributions

Conceptualization, R.E.; methodology, S.C., R.E., M.M. and A.B.; software, S.C. and A.B.; validation, S.C. and M.M.; formal analysis, S.C.; investigation, S.C. and A.B.; resources, S.C., R.E., and A.B.; data curation, S.C., R.E. and A.B.; writing—original draft preparation, S.C.; writing—review and editing, S.C., R.E. and M.M.; visualization, S.C.; supervision, R.E. and M.M.; project administration, S.C. and R.E.; funding acquisition, S.C. and R.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the United States Agency for International Development, award number AID-OAA-G-15-00007 and cooperative agreement number 7200AA18CA00015, and the George Washington University Department of Geography.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2, https://datatopics.worldbank.org/world-development-indicators/, https://www.diva-gis.org, https://www.naturalearthdata.com, and https://www.openstreetmap.org (accessed on 2 July 2021). Restrictions apply to the availability of certain data. Data obtained from Sri Lanka’s Department of Census and Statistics, DigitalGlobe, Ghana Statistical Service, and Statistical Institute of Belize are available from the authors with the permission of Sri Lanka Department of Census and Statistics, DigitalGlobe, Ghana Statistical Service, and Statistical Institute of Belize, respectively. Data obtained from Esri are available at https://www.arcgis.com/home/item.html?id=10df2279f9684e4a9f6a7f08febac2a9 (accessed on 2 July 2021) in accordance with the Esri Terms of Use.

Acknowledgments

The authors would like to acknowledge the support from the YouthMappers USAID grant.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Population Division, Department of Economic and Social Affairs, United Nations. World Urbanization Prospects; The 2018 Revision (ST/ESA/SER.A/420); United Nations: New York, NY, USA, 2019; ISBN 978-92-1-148319-2. [Google Scholar]
Curtis, K.J.; Schneider, A. Understanding the demographic implications of climate change: Estimates of localized population predictions under future scenarios of sea-level rise. Popul. Environ. 2011, 33, 28–54. [Google Scholar] [CrossRef]
Linard, C.; Tatem, A.J. Large-scale spatial population databases in infectious disease research. Int. J. Health Geogr. 2012, 11, 7. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Marandola, E.; Hogan, D.J. Vulnerabilities and risks in population and environment studies. Popul. Environ. 2006, 28, 83–112. [Google Scholar] [CrossRef]
National Research Council. Tools and Methods for Estimating Populations at Risk from Natural Disasters and Complex Humanitarian Crises; The National Academies Press: Washington, DC, USA, 2007; ISBN 978-0-309-10354-1. [Google Scholar]
Noji, E.K. Estimating population size in emergencies. Bull. World Health Organ. 2005, 83, 164. [Google Scholar]
Pal, A.; Graettinger, A.J.; Triche, M.H. Emergency Evacuation Modeling Based on Geographical Information System Data. In Proceedings of the Transportation Research Board 82nd Annual Meeting, Washington, DC, USA, 12–16 January 2003; pp. 1–16. [Google Scholar]
Tatem, A.J. Mapping the denominator: Spatial demography in the measurement of progress. Int. Health 2014, 6, 153–155. [Google Scholar] [CrossRef] [Green Version]
Benn, H.P. Bus Route Evaluation Standards; National Academy Press: Washington, DC, USA, 1995; ISBN 978-0-309-05855-1. [Google Scholar]
Clogg, C.C.; Massagli, M.P.; Eliason, S.R. Population undercount and social science research. Soc. Indic. Res. 1989, 21, 559–598. [Google Scholar] [CrossRef]
Guiteras, R.; Levinsohn, J.; Mobarak, A.M. Demand Estimation with Strategic Complementarities: Sanitation in Bangladesh; CEPR Discussion Paper No. DP13498; Centre for Economic Policy Research: London, UK, 2019; Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3328509 (accessed on 16 February 2020).
Schirm, A.L.; Preston, S.H. Census undercount adjustment and the quality of geographic population distributions. J. Am. Stat. Assoc. 1987, 82, 965–978. [Google Scholar] [CrossRef]
United Nations. Principles and Recommendations for Population and Housing Censuses (Revision 3); United Nations: New York, NY, USA, 2017; ISBN 978-92-1-161597-5. [Google Scholar]
Luo, W.; Wang, F. Measures of spatial accessibility to health care in a GIS environment: Synthesis and a case study in the Chicago region. Environ. Plan. B Plan. Des. 2003, 30, 865–884. [Google Scholar] [CrossRef] [Green Version]
Plane, D.A.; Rogerson, P.A. The Geographical Analysis of Population with Applications to Planning and Business; John Wiley & Sons, Inc.: New York, NY, USA, 1994; ISBN 978-047-151-014-7. [Google Scholar]
Tayman, J.; Pol, L. Retail site selection and geographic information systems. J. Appl. Bus. Res. 1995, 11, 46–54. [Google Scholar] [CrossRef]
Carr, D.L. Proximate population factors and deforestation in tropical agricultural frontiers. Popul. Environ. 2004, 25, 585–612. [Google Scholar] [CrossRef] [Green Version]
De Sherbinin, A.; Carr, D.; Cassels, S.; Jiang, L. Population and environment. Annu. Rev. Environ. Resour. 2007, 32, 345–373. [Google Scholar] [CrossRef] [PubMed]
Wardrop, N.A.; Jochem, W.C.; Bird, T.J.; Chamberlain, H.R.; Clarke, D.; Kerr, D.; Bengtsson, L.; Juran, S.; Seaman, V.; Tatem, A.J. Spatially Disaggregated Population Estimates in the Absence of National Population and Housing Census Data. Proc. Natl. Acad. Sci. USA 2018, 115, 3529–3537. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jia, P.; Qiu, Y.; Gaughan, A.E. A fine-scale spatial population distribution on the high-resolution gridded population surface and application in Alachua County, Florida. Appl. Geogr. 2014, 50, 99–107. [Google Scholar] [CrossRef]
Mennis, J. Dasymetric mapping for estimating population in small areas. Geogr. Compass 2009, 3, 727–745. [Google Scholar] [CrossRef]
Zandbergen, P.A.; Ignizio, D.A. Comparison of dasymetric mapping techniques for small-area population estimates. Cartogr. Geogr. Inf. Sci. 2010, 37, 199–214. [Google Scholar] [CrossRef]
Holt, J.B.; Lo, C.P.; Hodler, T.W. Dasymetric estimation of population density and areal interpolation of census data. Cartogr. Geogr. Inf. Sci. 2004, 31, 103–121. [Google Scholar] [CrossRef]
Tatem, A.J.; Noor, A.M.; von Hagen, C.; Di Gregorio, A.; Hay, S.I. High resolution population maps for low income nations: Combining land cover and census in East Africa. PLoS ONE 2007, 2, e1298. [Google Scholar] [CrossRef]
Ye, Y.; Wamukoya, M.; Ezeh, A.; Emina, J.B.O.; Sankoh, O. Health and demographic surveillance systems: A step towards full civil registration and vital statistics system in sub-Sahara Africa? BMC Public Health 2012, 12, 741. [Google Scholar] [CrossRef] [Green Version]
Zhu, H.; Li, Y.; Liu, Z.; Fu, B. Estimating the population distribution in a county area in China based on impervious surfaces. Photogramm. Eng. Remote Sens. 2015, 81, 155–163. [Google Scholar] [CrossRef]
Wu, S.-S.; Qiu, X.; Wang, L. Population estimation methods in GIS and remote sensing: A review. GIsci. Remote Sens. 2005, 42, 80–96. [Google Scholar] [CrossRef]
Li, G.; Weng, Q. Fine-scale population estimation: How Landsat ETM+ imagery can improve population distribution mapping. Can. J. Remote Sens. 2010, 36, 155–165. [Google Scholar] [CrossRef]
Karume, K.; Schmidt, C.; Kundert, K.; Bagula, M.E.; Safina, B.F.; Schomacker, R.; Ganza, D.; Azanga, O.; Nfundiko, C.; Karume, N.; et al. Use of remote sensing for population number determination. Open Access J. Sci. Technol. 2017, 5, 101227. [Google Scholar] [CrossRef] [PubMed]
Nagle, N.N.; Buttenfield, B.P.; Leyk, S.; Spielman, S. Dasymetric modeling and uncertainty. Ann. Assoc. Am. Geogr. 2014, 104, 80–95. [Google Scholar] [CrossRef] [Green Version]
Liu, L.; Peng, Z.; Wu, H.; Jiao, H.; Yu, Y. Exploring urban spatial feature with dasymetric mapping based on mobile phone data and LUR-2SFCAe method. Sustainability 2018, 10, 2432. [Google Scholar] [CrossRef] [Green Version]
Engstrom, R.; Newhouse, D.; Soundararajan, V. Estimating small-area population density in Sri Lanka using surveys and geo-spatial data. PLoS ONE 2020, 15, e0237063. [Google Scholar] [CrossRef]
Hersh, J.; Engstrom, R.; Mann, M. Open data for algorithms: Mapping poverty in Belize using open satellite derived features and machine learning. Inf. Technol. Dev. 2020, 27, 263–292. [Google Scholar] [CrossRef]
Li, G.; Weng, Q. Using Landsat ETM+ imagery to measure population density in Indianapolis, Indiana, USA. Photogramm. Eng. Remote Sens. 2005, 71, 947–958. [Google Scholar] [CrossRef] [Green Version]
Azar, D.; Graesser, J.; Engstrom, R.; Comenetz, J.; Leddy, R.M.; Schechtman, N.G.; Andrews, T. Spatial refinement of census population distribution using remotely sensed estimates of impervious surfaces in Haiti. Int. J. Remote Sens. 2010, 31, 5635–5655. [Google Scholar] [CrossRef]
Freire, S.; Kemper, T.; Pesaresi, M.; Florczyk, A.; Syrris, V. Combining GHSL and GPW to Improve Global Population Mapping. In Proceedings of the 2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Milan, Italy, 26–31 July 2015; pp. 2541–2543. [Google Scholar]
Li, L.; Lu, D. Mapping population density distribution at multiple scales in Zhejiang Province using Landsat Thematic Mapper and census data. Int. J. Remote Sens. 2016, 37, 4243–4260. [Google Scholar] [CrossRef]
Wu, C.; Murray, A.T. Population estimation using Landsat Enhanced Thematic Mapper imagery. Geogr. Anal. 2007, 39, 26–43. [Google Scholar] [CrossRef]
Calka, B.; Bielecka, E.; Zdunkiewicz, K. Redistribution population data across a regular spatial grid according to buildings characteristics. Geod. Cartogr. 2016, 65, 149–162. [Google Scholar] [CrossRef]
Silván-Cárdenas, J.L.; Wang, L.; Rogerson, P.; Wu, C.; Feng, T.; Kamphaus, B.D. Assessing fine-spatial-resolution remote sensing for small-area population estimation. Int. J. Remote Sens. 2010, 31, 5605–5634. [Google Scholar] [CrossRef]
Tomás, L.; Fonseca, L.; Almeida, C.; Leonardi, F.; Pereira, M. Urban population estimation based on residential buildings volume using IKONOS-2 images and LIDAR data. Int. J. Remote Sens. 2016, 37, 1–28. [Google Scholar] [CrossRef] [Green Version]
Joseph, M.; Wang, L.; Wang, F. Using Landsat imagery and census data for urban population density modeling in Port-au-Prince, Haiti. GIsci. Remote Sens. 2012, 49, 228–250. [Google Scholar] [CrossRef]
Anderson, S.J.; Tuttle, B.T.; Powell, R.L.; Sutton, P.C. Characterizing relationships between population density and nighttime imagery for Denver, Colorado: Issues of scale and representation. Int. J. Remote Sens. 2010, 31, 5733–5746. [Google Scholar] [CrossRef]
Chen, X.; Nordhaus, W. A test of the new VIIRS lights data set: Population and economic output in Africa. Remote Sens. 2015, 7, 4937–4947. [Google Scholar] [CrossRef] [Green Version]
Li, K.; Chen, Y.; Li, Y. The random forest-based method of fine-resolution population spatialization by using the International Space Station nighttime photography and social sensing data. Remote Sens. 2018, 10, 1650. [Google Scholar] [CrossRef] [Green Version]
Lo, C.P. Modeling the population of China using DMSP Operational Linescan System nighttime data. Photogramm. Eng. Remote Sens. 2001, 67, 1037–1047. [Google Scholar]
Azar, D.; Engstrom, R.; Graesser, J.; Comenetz, J. Generation of fine-scale population layers using multi-resolution satellite imagery and geospatial data. Remote Sens. Environ. 2013, 130, 219–232. [Google Scholar] [CrossRef]
Bagan, H.; Yamagata, Y. Analysis of urban growth and estimating population density using satellite images of nighttime lights and land-use and population data. GIsci. Remote Sens. 2015, 52, 765–780. [Google Scholar] [CrossRef]
Li, X.; Zhou, W. Dasymetric mapping of urban population in China based on radiance corrected DMSP-OLS nighttime light and land cover data. Sci. Total Environ. 2018, 643, 1248–1256. [Google Scholar] [CrossRef]
Stevens, F.R.; Gaughan, A.E.; Linard, C.; Tatem, A.J. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data. PLoS ONE 2015, 10, e0107042. [Google Scholar] [CrossRef] [Green Version]
Bayram, U.; Can, G.; Duzgun, S.; Yalabik, N. Evaluation of Textural Features for Multispectral Images. In Image and Signal Processing for Remote Sensing XVII, Proceedings of the SPIE Remote Sensing, Prague, Czech Republic, 26 October 2011; SPIE: Prague, Czech Republic, 2011; Volume 8180, pp. 81800I-1–81800I-14. [Google Scholar]
Sandborn, A.; Engstrom, R.N. Determining the relationship between census data and spatial features derived from high-resolution imagery in Accra, Ghana. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 1970–1977. [Google Scholar] [CrossRef]
Puissant, A.; Hirsch, J.; Weber, C. The utility of texture analysis to improve per-pixel classification for high to very high spatial resolution imagery. Int. J. Remote Sens. 2005, 26, 733–745. [Google Scholar] [CrossRef]
Abeigne Ella, L.P.; van den Bergh, F.; van Wyk, B.J.; van Wyk, M.A. A Comparison of Texture Feature Algorithms for Urban Settlement Classification. In Proceedings of the IGARSS 2008—2008 IEEE International Geoscience and Remote Sensing Symposium, Boston, MA, USA, 7–11 July 2008; pp. III-1308–III-1311. [Google Scholar]
Zhang, Q.; Wang, J.; Gong, P.; Shi, P. Study of urban spatial patterns from SPOT panchromatic imagery using textural analysis. Int. J. Remote Sens. 2003, 24, 4137–4160. [Google Scholar] [CrossRef]
Graesser, J.; Cheriyadat, A.; Vatsavai, R.R.; Chandola, V.; Long, J.; Bright, E. Image based characterization of formal and informal neighborhoods in an urban landscape. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1164–1176. [Google Scholar] [CrossRef]
Engstrom, R.; Sandborn, A.; Yu, Q.; Burgdorfer, J.; Stow, D.; Weeks, J.; Graesser, J. Mapping Slums using Spatial Features in Accra, Ghana. In Proceedings of the 2015 Joint Urban Remote Sensing Event (JURSE), Lausanne, Switzerland, 30 March–1 April 2015; pp. 1–4. [Google Scholar]
Engstrom, R.; Harrison, R.; Mann, M.; Fletcher, A. Evaluating the Relationship between Contextual Features Derived from Very High Spatial Resolution Imagery and Urban Attributes: A Case Study in Sri Lanka. In Proceedings of the 2019 Joint Urban Remote Sensing Event (JURSE), Vannes, France, 22–24 May 2019; pp. 1–4. [Google Scholar]
Engstrom, R.; Hersh, J.; Newhouse, D. Poverty from Space: Using High-Resolution Satellite Imagery for Estimating Economic Well-Being (English); (Working Paper No. WPS8284); World Bank Group: Washington, DC, USA, 2017; Available online: http://documents.worldbank.org/curated/en/610771513691888412/pdf/WPS8284.pdf (accessed on 14 April 2019).
Engstrom, R.; Newhouse, D.; Haldavanekar, V.; Copenhaver, A.; Hersh, J. The Relationship between Spatial and Spectral Features Derived from High Spatial Resolution Satellite Data and Urban Poverty in Colombo, Sri Lanka. In Proceedings of the 2017 Joint Urban Remote Sensing Event (JURSE), Dubai, United Arab Emirates, 6–8 March 2017; pp. 1–4. [Google Scholar]
Wang, L.; Wu, C. Population estimation using remote sensing and GIS technologies. Int. J. Remote Sens. 2010, 31, 5569–5570. [Google Scholar] [CrossRef]
Tiecke, T.G.; Liu, X.; Zhang, A.; Gros, A.; Li, N.; Yetman, G.; Kilic, T.; Murray, S.; Blankespoor, B.; Prydz, E.B.; et al. Mapping the world population one building at a time. arXiv 2017, arXiv:1712.05839. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Ghana Statistical Service. [dataset] Ghana Census and Shapefile. 2010. Available online: https://statsghana.gov.gh (accessed on 5 January 2020).
Statistical Institute of Belize. [dataset] Belize Census and Shapefile. 2010. Available online: http://sib.org.bz (accessed on 5 January 2020).
Department of Census and Statistics. [dataset] Sri Lanka Census and Shapefile. 2012. Available online: http://www.map.statistics.gov.lk (accessed on 5 January 2020).
DIVA-GIS. [dataset] (n.d.) Country Administrative Areas Shapefiles. Available online: https://diva-gis.org/gdata (accessed on 5 January 2020).
Natural Earth. [dataset] Admin 0—Countries. 2018. Available online: https://www.naturalearthdata.com/downloads/10m-cultural-vectors/10m-admin-0-countries/ (accessed on 5 January 2020).
Ghana Statistical Service. 2010 Population & Housing Census—District Analytical Report: La Dade-Kotopon Municipality; Ghana Statistical Service: Accra, Ghana, 2014. Available online: https://www2.statsghana.gov.gh/docfiles/2010_District_Report/Greater%20Accra/LA%20DADEkotopon.pdf (accessed on 20 February 2021).
Ghana Statistical Service. 2010 Population & Housing Census—District Analytical Report: Accra Metropolitan; Ghana Statistical Service: Accra, Ghana, 2014. Available online: https://new-ndpc-static1.s3.amazonaws.com/CACHES/PUBLICATIONS/2016/06/06/AMA.pdf (accessed on 20 February 2021).
Ghana Statistical Service. 2010 Population & Housing Census—District Analytical Report: Ledzokuku-Krowor Municipality; Ghana Statistical Service: Accra, Ghana, 2014. Available online: https://www2.statsghana.gov.gh/docfiles/2010_District_Report/Greater%20Accra/LEKMA.pdf (accessed on 20 February 2021).
Kwankye, S.O.; Cofie, E. Ghana’s population policy implementation: Past, present and future. Etude Popul. Afr. 2015, 29, 1734–1748. [Google Scholar] [CrossRef] [Green Version]
Owusu, G.; Afutu-Kotey, R.L. Poor urban communities and municipal interface in Ghana: A case study of Accra and Sekondi-Takoradi Metropolis. Afr. Stud. Q. 2010, 12, 1–16. [Google Scholar]
Akubia, J.E.K.; Bruns, A. Unravelling the frontiers of urban growth: Spatio-temporal dynamics of land-use change and urban expansion in Greater Accra Metropolitan Area, Ghana. Land 2019, 8, 131. [Google Scholar] [CrossRef] [Green Version]
Armah, F.A.; Odoi, J.O.; Yengoh, G.T.; Obiri, S.; Yawson, D.O.; Afrifa, E.K.A. Food security and climate change in drought-sensitive savanna zones of Ghana. Mitig. Adapt. Strateg. Glob. Chang. 2011, 16, 291–306. [Google Scholar] [CrossRef]
Owusu, G. Coping with Urban Sprawl: A Critical Discussion of the Urban Containment Strategy in a Developing Country City, Accra. In Proceedings of the Cities to Be Tamed? Standards and Alternatives in the Transformation of the Urban South, Milan, Italy, 15–17 November 2012; Planum: Rome, Italy, 2013; Volume 1, pp. 1–17. [Google Scholar]
Stow, D.A.; Weeks, J.R.; Toure, S.; Coulter, L.L.; Lippitt, C.D.; Ashcroft, E. Urban vegetation cover and vegetation change in Accra, Ghana: Connection to housing quality. Prof. Geogr. 2013, 65, 451–465. [Google Scholar] [CrossRef] [Green Version]
Teye, J. Urbanization and Migration in Africa; Population Division, Department of Economic and Social Affairs, United Nations: New York, NY, USA, 2018; Available online: https://www.un.org/en/development/desa/population/events/pdf/expert/28/EGM_Joseph_Teye.pdf (accessed on 2 February 2021).
Statistical Institute of Belize. Abstract of Statistics 2013; Statistical Institute of Belize: Belmopan, Belize, 2013. Available online: https://sib.org.bz/wp-content/uploads/2017/05/2013_Abstract_of_Statistics.pdf (accessed on 27 January 2020).
Statistical Institute of Belize. Compendium of Statistics—2015; Statistical Institute of Belize: Belmopan, Belize, 2015. Available online: https://sib.org.bz/wp-content/uploads/2015_Abstract_of_Statistics.pdf (accessed on 27 January 2020).
Day, M.J. Landscape and environment in Belize: An introduction. Caribb. Geogr. 2013, 13, 3–13. [Google Scholar]
World Bank Group. Belize Housing Policy: Diagnosis and Guidelines for Action; Report No. 62906—BZ; World Bank Group: Washington, DC, USA, 2011; Available online: https://collaboration.worldbank.org/content/usergenerated/asi/cloud/attachments/sites/collaboration-for-development/en/groups/affordable-housing-ksb-c4d/documents/jcr:content/content/primary/blog/belize_housing_polic-jAIQ/Belize%20Housing%20Policy%20Diagnosis%20and%20guidelines%20for%20action.pdf (accessed on 28 January 2020).
World Bank Group. [dataset] World Development Indicators. 2019. Available online: https://databank.worldbank.org/source/world-development-indicators (accessed on 18 March 2020).
Munoz, I.; Gibson, D.V. Belize: Decoding the Census; IC2 Institute: Austin, TX, USA, 2015; Available online: https://repositories.lib.utexas.edu/bitstream/handle/2152/47365/munoz-2015-belize-decoding-the-census.pdf (accessed on 11 November 2019).
Jackiewicz, E.L.; Govdyak, O. Diversity of lifestyle: A view from Belize. Yearb. Assoc. Pac. Coast Geogr. 2015, 77, 18–39. [Google Scholar] [CrossRef]
Department of Census and Statistics. Census of Population and Housing: Sri Lanka 2012; Department of Census and Statistics: Battaramulla, Sri Lanka, 2012. Available online: http://www.statistics.gov.lk/PopHouSat/CPH2011/index.php?fileName=SriLanka&gp=Activities&tpl=3 (accessed on 2 February 2020).
Department of Census and Statistics. Housing Tables; Department of Census and Statistics: Battaramulla, Sri Lanka, 2015. Available online: http://www.statistics.gov.lk/PopHouSat/CPH2011/Pages/Activities/Reports/Finalhousing.pdf (accessed on 2 February 2020).
Ministry of Housing and Construction. Housing and Sustainable Urban Development in Sri Lanka—National Report for the Third United Nations Conference on Human Settlements Habitat III; Ministry of Housing and Construction of the Government of Democratic Socialist Republic of Sri Lanka: Battaramulla, Sri Lanka, 2016. Available online: https://uploads.habitat3.org/hb3/Sri-Lanka-(Final-in-English).pdf (accessed on 14 January 2020).
World Bank Group. Leveraging Urbanization in Sri Lanka. Available online: https://www.worldbank.org/en/country/srilanka/brief/leveraging-urbanization-sri-lanka (accessed on 4 February 2020).
DigitalGlobe. [Dataset] WorldView-2 Imagery of Colombo. 2010. Available online: https://discover.digitalglobe.com (accessed on 4 February 2020).
DigitalGlobe. [Dataset] WorldView-2 Imagery of Kurunegala. 2012. Available online: https://discover.digitalglobe.com (accessed on 4 February 2020).
DigitalGlobe. [Dataset] GeoEye-1 Imagery of Batticaloa and Negombo. 2010. Available online: https://discover.digitalglobe.com (accessed on 4 February 2020).
European Space Agency. [dataset] Sentinel-2 Imagery of Belize and Sri Lanka. 2018. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2 (accessed on 27 August 2018).
European Space Agency. [dataset] Sentinel-2 Imagery of Accra. 2020. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2 (accessed on 2 July 2021).
OpenStreetMap. [Dataset] Buildings and Roads Shapefile. 2019. Available online: https://www.openstreetmap.org (accessed on 20 August 2019).
OpenStreetMap. [Dataset] Buildings and Roads Shapefile. 2020. Available online: https://www.openstreetmap.org (accessed on 25 June 2020).
OpenStreetMap. About OpenStreetMap. Available online: https://wiki.openstreetmap.org/wiki/About_OpenStreetMap (accessed on 3 May 2021).
Graesser, J. SpFeas. Available online: https://github.com/jgrss/spfeas (accessed on 5 March 2019).
Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 1996; ISBN 978-013-205-840-7. [Google Scholar]
King, M. Fourier transform. In Statistics for Process Control Engineers: A Practical Approach; John Wiley & Sons Ltd.: Hoboken, NJ, USA, 2017; pp. 305–313. ISBN 978-111-938-350-5. [Google Scholar]
Couteron, P. Quantifying change in patterned semi-arid vegetation by Fourier analysis of digitized aerial photographs. Int. J. Remote Sens. 2002, 23, 3407–3425. [Google Scholar] [CrossRef]
Chen, C.; Zhou, L.; Guo, J.; Li, W.; Su, H.; Guo, F. Gabor-Filtering-Based Completed Local Binary Patterns for Land-Use Scene Classification. In Proceedings of the 2015 IEEE International Conference on Multimedia Big Data, Beijing, China, 20–22 April 2015; pp. 324–329. [Google Scholar]
Li, W.; Du, Q. Gabor-filtering-based nearest regularized subspace for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 1012–1022. [Google Scholar] [CrossRef]
Yang, K.; Li, M.; Liu, Y.; Cheng, L.; Huang, Q.; Chen, Y. River detection in remotely sensed imagery using Gabor filtering and path opening. Remote Sens. 2015, 7, 8779–8802. [Google Scholar] [CrossRef] [Green Version]
Bianconi, F.; Fernández, A. Evaluation of the effects of Gabor filter parameters on texture classification. Pattern Recognit. 2007, 40, 3325–3335. [Google Scholar] [CrossRef] [Green Version]
Rajadell, O.; García-Sevilla, P.; Pla, F. Spectral-spatial pixel characterization using Gabor filters for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2013, 10, 860–864. [Google Scholar] [CrossRef] [Green Version]
Risojević, V.; Momić, S.; Babić, Z. Gabor descriptors for aerial image classification. In Adaptive and Natural Computing Algorithms; Dobnikar, A., Lotrič, U., Šter, B., Eds.; Springer: Heidelberg, Germany, 2011; Volume 6594, pp. 51–60. ISBN 978-3-642-20266-7. [Google Scholar]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 1, pp. 886–893. [Google Scholar]
Torrione, P.A.; Morton, K.D.; Sakaguchi, R.; Collins, L.M. Histograms of oriented gradients for landmine detection in ground-penetrating RADAR data. IEEE Trans. Geosci. Remote Sens. 2014, 52, 1539–1550. [Google Scholar] [CrossRef]
Lee, K.L.; Mokji, M.M. Automatic Target Detection in GPR Images Using Histogram of Oriented Gradients (HOG). In Proceedings of the 2014 2nd International Conference on Electronic Design (ICED), Penang, Malaysia, 19–21 August 2014; pp. 181–186. [Google Scholar]
Lei, Z.; Fang, T.; Li, D. Histogram of oriented gradient detector with color-invariant gradients in Gaussian color space. Opt. Eng. 2010, 49, 109701. [Google Scholar] [CrossRef]
Myint, S.W.; Mesev, V.; Lam, N. Urban textural analysis from remote sensor data: Lacunarity measurements based on the differential box counting method. Geogr. Anal. 2006, 38, 371–390. [Google Scholar] [CrossRef]
Quan, Y.; Xu, Y.; Sun, Y.; Luo, Y. Lacunarity Analysis on Image Patterns for Texture Classification. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 160–167. [Google Scholar]
Henebry, G.M.; Kux, H.J.H. Lacunarity as a texture measure for SAR imagery. Int. J. Remote Sens. 1995, 16, 565–571. [Google Scholar] [CrossRef]
Plotnick, R.E.; Gardner, R.H.; Hargrove, W.W.; Prestegaard, K.; Perlmutter, M. Lacunarity analysis: A general technique for the analysis of spatial patterns. Phys. Rev. E 1996, 53, 5461–5468. [Google Scholar] [CrossRef] [Green Version]
Dong, P. Test of a new lacunarity estimation method for image texture analysis. Int. J. Remote Sens. 2000, 21, 3369–3373. [Google Scholar] [CrossRef]
Burns, J.B.; Hanson, A.R.; Riseman, E.M. Extracting straight lines. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 425–455. [Google Scholar] [CrossRef]
Ünsalan, C. Gradient-magnitude-based support regions in structural land use classification. IEEE Geosci. Remote Sens. Lett. 2006, 3, 546–550. [Google Scholar] [CrossRef]
Ojala, T.; Pietikäinen, M.; Mäenpää, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Guo, Z.; Zhang, L.; Zhang, D. A completed modeling of local binary pattern operator for texture classification. IEEE Trans. Image Process. 2010, 19, 1657–1663. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Huang, D.; Shan, C.; Ardabilian, M.; Wang, Y.; Chen, L. Local binary patterns and its application to facial image analysis: A survey. IEEE Trans. Syst. Man Cybern. C Appl. Rev. 2011, 41, 765–781. [Google Scholar] [CrossRef] [Green Version]
Zhao, G.; Ahonen, T.; Matas, J.; Pietikäinen, M. Rotation-invariant image and video description with local binary pattern features. IEEE Trans. Image Process. 2012, 21, 1465–1477. [Google Scholar] [CrossRef] [PubMed]
Li, W.; Chen, C.; Su, H.; Du, Q. Local binary patterns and extreme learning machine for hyperspectral imagery classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 3681–3693. [Google Scholar] [CrossRef]
Bolstad, P. GIS Fundamentals: A First Text on Geographic Information Systems, 5th ed.; XanEdu: Acton, MA, USA, 2016; ISBN 978-150-669-587-7. [Google Scholar]
Fung, T.; Siu, W. Environmental quality and its changes, an analysis using NDVI. Int. J. Remote Sens. 2000, 21, 1011–1024. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2564–2571. [Google Scholar]
Kulkarni, A.V.; Jagtap, J.S.; Harpale, V.K. Object recognition with ORB and its implementation on FPGA. Int. J. Adv. Comput. Res. 2013, 3, 156–162. [Google Scholar]
Wu, S.; Fan, Y.; Zheng, S.; Yang, H. Object Tracking Based on ORB and Temporal-Spacial Constraint. In Proceedings of the 2012 IEEE Fifth International Conference on Advanced Computational Intelligence (ICACI), Nanjing, China, 18–20 October 2012; pp. 597–600. [Google Scholar]
Lei, Y.; Yu, Z.; Gong, Y. An improved ORB algorithm of extracting and matching features. IJSIP 2015, 8, 117–126. [Google Scholar] [CrossRef] [Green Version]
Wang, R.; Zhang, W.; Shi, Y.; Wang, X.; Cao, W. GA-ORB: A new efficient feature extraction algorithm for multispectral images based on geometric algebra. IEEE Access 2019, 7, 71235–71244. [Google Scholar] [CrossRef]
Xu, J.; Chang, H.-W.; Yang, S.; Wang, M. Fast feature-based video stabilization without accumulative global motion estimation. IEEE Trans. Consum. Electron. 2012, 58, 993–999. [Google Scholar] [CrossRef]
Pham, T.H.; Tran, P.; Lam, S.-K. High-throughput and area-optimized architecture for rBRIEF feature extraction. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2019, 27, 747–756. [Google Scholar] [CrossRef]
Pesaresi, M.; Gerhardinger, A. Improved textural built-up presence index for automatic recognition of human settlements in arid regions with scattered vegetation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2011, 4, 16–26. [Google Scholar] [CrossRef]
Pesaresi, M.; Gerhardinger, A.; Kayitakire, F. A robust built-up area presence index by anisotropic rotation-invariant textural measure. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2008, 1, 180–192. [Google Scholar] [CrossRef]
Huang, X.; Zhang, L.; Li, P. Classification and extraction of spatial features in urban areas using high-resolution multispectral imagery. IEEE Geosci. Remote Sens. Lett. 2007, 4, 260–264. [Google Scholar] [CrossRef]
Sghaier, M.O.; Foucher, S.; Lepage, R. River extraction from high-resolution SAR images combining a structural feature set and mathematical morphology. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 1025–1038. [Google Scholar] [CrossRef]
Esri. [dataset] World Imagery Basemap. 2021. Available online: https://www.arcgis.com/home/item.html?id=10df2279f9684e4a9f6a7f08febac2a9 (accessed on 27 April 2021).
Ramm, F. OpenStreetMap Data in Layered GIS Format; Geofabrik: Karlsruhe, Germany, 2019; Available online: https://download.geofabrik.de/osm-data-in-gis-formats-free.pdf (accessed on 30 July 2019).
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef] [Green Version]
James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning; Springer: New York, NY, USA, 2013; Volume 103, ISBN 978-1-4614-7137-0. [Google Scholar]
Hans, C. Elastic net regression modeling with the orthant normal prior. J. Am. Stat. Assoc. 2011, 106, 1383–1393. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. sklearn.linear_model.ElasticNetCV. Available online: https://scikit-learn.org/0.21/modules/generated/sklearn.linear_model.ElasticNetCV.html (accessed on 2 April 2019).
Zou, H.; Hastie, T. Regularization and Variable Selection via the Elastic Net. Available online: https://web.stanford.edu/~hastie/TALKS/enet_talk.pdf (accessed on 4 March 2020).
Schreiber-Gregory, D.N. Regulation Techniques for Multicollinearity: Lasso, Ridge, and Elastic Nets. In Proceedings of the Western Users of SAS Software 2018, Sacramento, CA, USA, 5–7 September 2018; pp. 1–23. [Google Scholar]
Schreiber-Gregory, D.N. Ridge regression and multicollinearity: An in-depth review. Model Assist. Stat. Appl. 2018, 13, 359–365. [Google Scholar] [CrossRef] [Green Version]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. sklearn.ensemble.RandomForestRegressor. Available online: https://scikit-learn.org/0.21/modules/generated/sklearn.ensemble.RandomForestRegressor.html (accessed on 15 July 2019).
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. sklearn.model_selection.GridSearchCV. Available online: https://scikit-learn.org/0.21/modules/generated/sklearn.model_selection.GridSearchCV.html (accessed on 15 July 2019).
Woodcock, C.E.; Strahler, A.H. The factor of scale in remote sensing. Remote Sens. Environ. 1987, 21, 311–332. [Google Scholar] [CrossRef]

Figure 1. Project methodology flowchart.

Figure 2. Study areas consisting of the countries of Belize and Sri Lanka and the city of Accra, Ghana. Sources: [67,68].

Figure 3. Scale and block for calculating contextual features. For very-high-spatial-resolution imagery (a), the scale sizes were set to 8 m × 8 m, 16 m × 16 m, 32 m × 32 m, and 64 m × 64 m, and the block size was set to 8 pixels. For Sentinel-2 imagery (b), most scale sizes were set to 30 m × 30 m, 50 m × 50 m, and 70 m × 70 m, and the block size was set to one pixel. Sources: [90,94].

Figure 4. Example census units with complete OpenStreetMap data. All Enumeration Areas, Enumeration Districts, and Grama Niladhari Divisions with complete and accurate road and building OpenStreetMap data were identified. Sources: [67,95,96,137]. Basemaps [137] reprinted in accordance with Terms of Use from Esri (2021). Copyright 2021 Esri.

Table 1. Contextual feature relationships to urban attributes and population density analyzed in research.

Question	Independent Variables	Dependent Variable	Area(s)
1	Contextual features (very-high spatial resolution)	Urban attributes (OpenStreetMap)	Sri Lanka (4 cities)
2	Contextual features (Sentinel-2)	Urban attributes (OpenStreetMap)	Sri Lanka (4 cities) Sri Lanka (country level) Accra Accra–Belize–Sri Lanka
3 & 4	Contextual features (Sentinel-2)	Population density (census)	Sri Lanka (country level) Accra Belize Accra–Belize–Sri Lanka

Table 2. Study area census unit counts and spatial area statistics.

Study Area	Census Units	Minimum	Mean	Maximum
Accra ¹	2403	0.0019 km²	0.09 km²	6.75 km²
Belize ²	723	0.01 km²	52.70 km²	5345.56 km²
Sri Lanka ³	14,021	0.04 km²	4.69 km²	562.64 km²

¹ Data from Ghana Statistical Service [64]. ² Data from Statistical Institute of Belize [65]. ³ Data from Department of Census and Statistics [66].

Table 3. Road classes and widths to calculate road area in each study area.

Area	OpenStreetMap Road Class ¹	Two-Way Road ¹	One-Way Road ¹
Accra, Ghana	trunk	20.00 m	10.00 m
	trunk link	10.00 m	5.00 m
	primary	10.00 m	8.00 m
	primary link unclassified	8.00 m	5.00 m
	residential	7.00 m	7.00 m
	secondary	8.00 m	8.00 m
	tertiary	10.00 m	5.00 m
	cycleway track secondary link tertiary link service	5.00 m	5.00 m
	path track grade3	3.00 m	3.00 m
	footway	4.00 m	4.00 m
	pedestrian	3.50 m	3.50 m
	(other)	0 m	0 m
Belize	primary primary link	13.00 m	6.50 m
	secondary secondary link	10.00 m	5.00 m
	tertiary tertiary link	7.50 m	3.75 m
	living street residential service track track grade1 track grade2 track grade3 track grade4 track grade5 unclassified	5.00 m	5.00 m
	cycleway footway path pedestrian	4.00 m	4.00 m
	(other)	0 m	0 m
Sri Lanka	motorway motorway link trunk trunk link primary primary link	15.00 m	7.50 m
	secondary secondary link	10.50 m	5.25 m
	tertiary tertiary link cycleway footway living street path pedestrian residential service track track grade3 track grade5 unclassified unknown	4.25 m	4.25 m
	(other)	0 m	0 m

¹ Road classes and directions from OpenStreetMap [95,96] and Ramm [138].

Table 4. User-defined parameters for elastic net regularization using ElasticNetCV from the scikit-learn library. The remaining parameters were default.

Parameter ^a	Description of Purpose ^a	Value(s)
max_iter	maximum iterations	1e8
alphas	constraint	0.0005, 0.001, 0.01, 0.03, 0.05, 0.1
l1_ratio	the ratio between $l_{1}$ and $l_{2}$ penalties	0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1
verbose	verbosity	False
cv	cross-validation splitting strategy	5
selection	random coefficient updated each iteration	random
fit_intercept	calculate intercept if data not centered	False

^a Parameters and descriptions from Pedregosa et al., [142,146].

Table 5. User-defined parameters for random forest using RandomForestRegressor from the scikit-learn library. The remaining parameters were default; some parameters were not used until GridSearchCV (Table 6).

Parameter ¹	Description of Purpose ¹	Value(s)
n_estimators	number of trees in forest	[see Table 6]
min_samples_leaf	minimum number of samples at leaf node	[see Table 6]
max_features	maximum number of features to be considered during split	[see Table 6]

¹ Parameters and descriptions from Pedregosa et al., [146,148].

Table 6. User-defined parameters for cross-validation using GridSearchCV from the scikit-learn library. The param_grid parameters were taken from the RandomForestRegressor output. The remaining parameters were default.

Parameter ¹	Description of Purpose ¹	Value(s)
param_grid	parameters used for cross-validation	n_estimators: 200, 300, 500, 700, 900, 1000 min_samples_leaf: 1, 2, 5, 10, 25 max_features: auto, sqrt, log2, 0.33, 0.20, 0.10, None
cv	cross-validation splitting strategy	5
scoring	method to evaluate predictions against test set	neg_mean_squared_error

¹ Parameters and descriptions from Pedregosa et al., [146,149].

Table 7. R² and mean square error values for urban attributes using elastic net regularization models at different spatial resolutions for Sri Lanka (192 Grama Niladhari Divisions).

Urban Attribute	Very-High-Spatial-Resolution Imagery			Sentinel-2 Imagery
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
building area	0.82	0.43	0.50	0.85	0.60	0.35
building count	0.77	0.51	0.46	0.69	0.46	0.50
building density	0.94	0.85	0.14	0.78	0.59	0.39
road area	0.93	0.77	0.22	0.83	0.76	0.23
road length	0.94	0.78	0.22	0.87	0.80	0.19
road density	0.86	0.75	0.24	0.71	0.62	0.37
built-up area	0.95	0.75	0.23	0.86	0.69	0.28
built-up percent	0.91	0.83	0.16	0.85	0.77	0.22

Table 8. R² and mean square error values for urban attributes using random forest models at different spatial resolutions for Sri Lanka (192 Grama Niladhari Divisions).

Urban Attribute	Very-High-Spatial-Resolution Imagery			Sentinel-2 Imagery
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
building area	0.90	0.63	0.39	0.86	0.52	0.50
building count	0.91	0.51	0.48	0.83	0.47	0.52
building density	0.98	0.82	0.16	0.97	0.74	0.24
road area	0.97	0.81	0.18	0.97	0.82	0.17
road length	0.97	0.83	0.17	0.96	0.84	0.15
road density	0.96	0.73	0.26	0.93	0.64	0.35
built-up area	0.93	0.70	0.33	0.90	0.66	0.35
built-up percent	0.97	0.80	0.18	0.97	0.78	0.20

Table 9. R² and mean square error values for urban attributes using elastic net regularization and random forest models with Sentinel-2 imagery for Accra (314 Enumeration Areas).

Urban Attribute	Elastic Net Regularization			Random Forest
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
building area	0.91	0.41	0.17	0.92	0.49	0.47
building count	0.44	-0.06	0.95	0.81	0.35	0.62
building density	0.50	0.36	0.63	0.92	0.48	0.51
road area	0.98	0.70	0.11	0.91	0.59	0.53
road length	0.98	0.68	0.14	0.91	0.59	0.54
road density	0.12	0.02	0.93	0.83	0.12	0.84
built-up area	0.97	0.59	0.08	0.91	0.65	0.49
built-up percent	0.84	0.78	0.22	0.97	0.80	0.20

Table 10. R² and mean square error values for urban attributes using elastic net regularization and random forest models with Sentinel-2 imagery for Sri Lanka (333 Grama Niladhari Divisions).

Urban Attribute	Elastic Net Regularization			Random Forest
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
building area	0.72	0.59	0.41	0.92	0.56	0.46
building count	0.57	0.44	0.56	0.91	0.44	0.58
building density	0.83	0.71	0.28	0.97	0.81	0.18
road area	0.75	0.51	0.45	0.95	0.69	0.30
road length	0.77	0.53	0.44	0.96	0.71	0.27
road density	0.77	0.69	0.29	0.96	0.76	0.23
built-up area	0.73	0.42	0.55	0.95	0.67	0.34
built-up percent	0.88	0.81	0.18	0.98	0.86	0.13

Table 11. R² and mean square error values for urban attributes using elastic net regularization and random forest models with Sentinel-2 imagery for Accra (314 Enumeration Areas), Belize (80 Enumeration Districts), and Sri Lanka (333 Grama Niladhari Divisions) combined.

Urban Attribute	Elastic Net Regularization			Random Forest
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
building area	0.75	0.62	0.39	0.94	0.74	0.28
building count	0.74	0.55	0.45	0.95	0.75	0.26
building density	0.75	0.71	0.30	0.97	0.78	0.22
road area	0.82	0.53	0.46	0.95	0.78	0.23
road length	0.83	0.60	0.40	0.97	0.82	0.19
road density	0.42	0.34	0.66	0.90	0.45	0.55
built-up area	0.83	0.62	0.37	0.97	0.81	0.20
built-up percent	0.93	0.90	0.10	0.99	0.93	0.07

Table 12. R² and mean square error values for population density using elastic net regularization and random forest models with Sentinel-2 imagery for Accra (2216 Enumeration Areas), Belize (687 Enumeration Districts), and Sri Lanka (13,402 Grama Niladhari Divisions).

Study Area	Elastic Net Regularization			Random Forest
	In-Sample R²	Out-of-Sample R²	Mean Square Error	In-Sample R²	Out-of-Sample R²	Mean Square Error
Accra	0.61	0.57	0.43	0.95	0.74	0.26
Belize	0.81	0.73	0.28	0.94	0.78	0.24
Sri Lanka	0.68	0.65	0.35	0.96	0.77	0.23
Accra–Belize–Sri Lanka	0.69	0.67	0.34	0.97	0.84	0.16

Table 13. Impacts of degrading spatial resolution from very-high-spatial-resolution (VHSR) to Sentinel-2 data on urban attribute model performance. The differences in out-of-sample R² between the VHSR and Sentinel-2 models were calculated. A negative value indicates the out-of-sample R² decreased (a decrease in predictive power) from the VHSR model to Sentinel-2 model; a positive value indicates the out-of-sample R² increased (an increase in predictive power) from VHSR to Sentinel-2. Differences may not correspond to actual R² values due to rounding. Abbreviation: ENR, elastic net regularization.

Urban Attribute	ENR R² Difference (VHSR to Sentinel-2)	Random Forest R² Difference (VHSR to Sentinel-2)
building area	0.16	−0.12
building count	−0.05	−0.04
building density	−0.26	−0.08
road area	−0.02	0.01
road length	0.02	0.01
road density	−0.13	−0.08
built-up area	−0.06	−0.03
built-up percent	−0.06	−0.02

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chao, S.; Engstrom, R.; Mann, M.; Bedada, A. Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions. Remote Sens. 2021, 13, 3962. https://doi.org/10.3390/rs13193962

AMA Style

Chao S, Engstrom R, Mann M, Bedada A. Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions. Remote Sensing. 2021; 13(19):3962. https://doi.org/10.3390/rs13193962

Chicago/Turabian Style

Chao, Steven, Ryan Engstrom, Michael Mann, and Adane Bedada. 2021. "Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions" Remote Sensing 13, no. 19: 3962. https://doi.org/10.3390/rs13193962

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating the Ability to Use Contextual Features Derived from Multi-Scale Satellite Imagery to Map Spatial Patterns of Urban Attributes and Population Distributions

Abstract

1. Introduction

1.1. Population Estimation Techniques

1.2. Spatial Features and Remote Sensing

2. Materials and Methods

2.1. Study Areas

2.1.1. Accra, Ghana

2.1.2. Belize

2.1.3. Sri Lanka

2.2. Data Acquisition

2.2.1. Multispectral Satellite Imagery

2.2.2. Urban Attributes

2.2.3. Population

2.3. Data Processing

2.3.1. Contextual Features

2.3.2. Urban Attributes

2.3.3. Population Density

2.4. Data Preparation

2.5. Model Building

2.5.1. Elastic Net Regularized Regression

2.5.2. Random Forest Regression

3. Results

3.1. Human-Modified Landscape and Very-High-Spatial-Resolution Imagery Contextual Features

3.2. Human-Modified Landscape and Imagery Spatial Resolution

3.3. Human-Modified Landscape and Sentinel-2 Imagery Contextual Features

3.4. Population Density and Sentinel-2 Contextual Features

4. Discussion

4.1. Human-Modified Landscape and Very-High-Spatial-Resolution Imagery Contextual Features

4.2. Human-Modified Landscape and Imagery Spatial Resolution

4.3. Human-Modified Landscape and Sentinel-2 Imagery Contextual Features

4.4. Population Density and Sentinel-2 Contextual Features

4.5. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI