A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing

Mîrț, Andrei; Reiche, Johannes; Verbesselt, Jan; Herold, Martin

doi:10.3390/rs14215538

Open AccessArticle

A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing

Laboratory of Geo-Information Science and Remote Sensing, Wageningen University & Research, Droevendaalsesteg 3, 6708 PB Wageningen, The Netherlands

^*

Author to whom correspondence should be addressed.

^†

Current address: Helmholtz GFZ German Research Centre for Geosciences, Section 1.4 Remote Sensing and Geoinformatics, Telegrafenberg, 14473 Potsdam, Germany.

Remote Sens. 2022, 14(21), 5538; https://doi.org/10.3390/rs14215538

Submission received: 9 August 2022 / Revised: 10 October 2022 / Accepted: 25 October 2022 / Published: 3 November 2022

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Handling multiple scales efficiently is one avenue for processing big remote sensing imagery data. Unfortunately, imagery is also affected by the infamous modifiable areal unit problem, which creates unpredictable errors at different scales. We developed a downsampling method that attempts to keep the data distribution in a downsampled image constant, reducing the modifiable areal unit problem. We tested our method against classic downsampling methods (mean, central pixel selection, random) under a range of typical remote sensing scenarios. Under our experimental conditions, our downsampling method consistently outperformed the classical downsampling methods within a 95% confidence level. The downsampling method can be used in most typical situations where downsampling is needed, but it is likely to shine when used as a pyramid building policy in geocomputing platforms, such as Google Earth Engine.

Keywords:

modifiable areal unit problem; remote sensing; downsampling; big data; multiscale

1. Introduction and Background

Most of the recent developments in remote sensing big data focus on increasing processing capacity, e.g., parallelization [1,2], cloud computing [3,4,5,6], machine learning, etc. More processing increases the costs and energy consumption, which is fine as long as there is a net positive societal value [7]. For example, monitoring the forest at a global scale is a critical tool for mitigating climate change [8]. Landsat’s 30 m imagery is often used for monitoring forests [9,10,11], yet processing each pixel might be redundant for global scale maps. As such, computing resources could be prioritized to process only a sample of pixels to obtain similar-quality maps. Global remote sensing monitoring solutions must be able to make use of the fine resolution available to quickly detect small events. At the same time, land cover change at a global scale is rare, with changes in critical resources, such as forests making no exception [12]. In technical terms, the resolution is critical but the sampling distance is not.

If we consider the example of forest monitoring, and a global goal to quantify deforestation, a sampling strategy could be devised to minimize processing work and guarantee a certain accuracy within the confidence level. For the most part, the goal of such a remote sensing analysis is to also obtain a map of the change in order to assess how much (and where) change is happening. Quantifying change is a global property of the result, while the location is a local property. To minimize computational needs in remote sensing through downsampling, the global and local aspects of output maps must be balanced, preferably before employing intensive processing algorithms on each pixel at native resolution.

In digital image processing literature, we can find recommendations on image resampling with respect to the human visual system [13,14], frequency response [15,16], or specific post-processing goals [14,17,18]. In remote sensing applications, aliasing can be an issue, so the frequency response of a downsampling method is relevant [19]. The Nyquist–Shannon theorem provides a perfect sampling solution to avoid aliasing but it requires for the size of detected objects to be known [20], which is not always the case in remote sensing applications, such as monitoring deforestation, land cover classifications, and monitoring draughts.

Remote sensing borrows downsampling methods from digital image processing, but renames the general technique to upscaling, or pixel aggregation, to better reflect the goals specific to the field. In remote sensing literature, the focus is more on comparing existing downsampling methods for different purposes [21,22,23,24,25], and less on improving, or developing novel pixel aggregation methods. Raj et al., 2013, analyse the effects of downsampling remote sensing imagery, by mean and nearest neighbor methods, on a range of classification algorithms [22]. They conclude that the downsampling method impacts classifications differently depending on the thematic objective of the classification, e.g., in their study, mean downsampling was performed poorly on forest monitoring studies at the regional or national level. Xu et al., 2019, analyzed the effects of upscaling with respect to land cover classification [23]. They compared classification results based on downsampled images by typical algorithms, such as the nearest neighbor, bilinear, and cubic convolution. They obtained significantly different results and concluded that great care must be taken when performing multi-scale classifications. Ultimately, Xu et al. urged the research community to find effective downsampling solutions that reduce problems caused by spatial heterogeneity.

In this paper, we follow Xu et al.’s advice and present a new method for downsampling, also known as aggregation, or upscaling, which addresses spatial heterogeneity, and more widely, the modifiable areal unit problem. The modifiable areal unit problem is a common source of statistical bias when working with aggregates over regions. The problem, as better described by Openshaw [26], is that “the areal units (zonal objects) used in many geographical studies are arbitrary, modifiable, and subject to the whims and fancies of whoever is doing, or did, the aggregating”. When it comes to raster data (the focus of our paper), the shapes are never changed by definition, but the size of the areal unit is changed when downsampling. In the next section (Section 2), we will present our solution to this problem.

The objective of our paper is:

Demonstrate that our downsampling method consistently provides improved results over the other tested downsampling methods, within the limitations of the tested scenarios.

Further, we set up an experiment that tests our solution against other downsampling methods using two remote sensing algorithms (Section 3). We conclude the paper with the presentation of results (Section 4) and a discussion (Section 5).

2. Methodology

We propose a new downsampling method, whose goals are to minimize the accuracy loss due to scaling, and maximizing spatial accuracy within constraints, essentially minimizing the modifiable areal unit problem.

We attempt to reach these goals by selecting a single pixel to represent an aggregated areal unit, depending on the histogram-based classification of the image at native resolution. The method relies on the assumption that closer values are more likely to yield the same result in any classification algorithm; therefore, it is less important which value is sampled among similar values. By choosing pixels from inside aggregated areas with respect to the global proportionality of histogram-based classes, both local and global properties are maintained as much as possible in the downsampled image. We postulate that compared to other downsampling methods, the novel approach is more likely to yield similar classification results to non-downsampled classification results, regardless of the classification algorithm.

The downsampling method’s parameters are a raster image, a scale factor, and (optionally) other parameters to be passed to the histogram calculation. We present (see Figure 1) and describing the main parts of the method. The downsampling method is also presented in more detail in the Appendix A flowchart, split into Figure A1 and Figure A2.

2.1. Histogram-Based Classification

First, the histogram of the input raster is calculated. The histogram breaks are used as thresholds to calculate the histogram-based classified raster, while the count of pixel values in each bin are used to estimate the count of areal units belonging to each histogram class after downsampling. These per-class areal unit counts are later used as caps for selecting values belonging to each class.

2.2. Histogram Class Frequency per Areal Unit

The histogram-based raster classification is split into a stack of binary rasters per class, and then each raster is sum aggregated at the required downsampling scale. Each areal unit will contain a value between zero and the square of the scale factor, representing the count of values from the input image, belonging to the respective class and areal unit. This count represents the “power” of the class in the areal unit. For brevity, we will call this raster stack the power stack, similarly to how it is named in our R implementation.

2.3. Ranked Areal Units per Class

Existing power values are independently consecutively ranked per class. The highest power per class is assigned the highest rank, 1, while the lowest rank is a higher value equal to the number of existing powers inside its respective class. The power stack is then reclassified according to its rank. Ranking powers are useful because they normalize the power values, e.g., very dispersed classes, may only have low powers, but when ranked, each class will have top ranking (1) areal units.

2.4. Iterative Class Search for Areal Units

The goal in this step is to assign a class to each areal unit, giving preference to classes with a lower total count of areal units. In our R implementation, we start with an empty template based on one of the ranking rasters, which is then filled iteratively to obtain a histogram class-based downsampled raster.

In each iteration, we attempt to find the most suitable areal units to which to assign the current class, up to the cap calculated according to the histogram in the first step (see Section 2.1).

The search follows two rules:

2.4.1. Top Ranking per Class Rule

All top-ranking areal units up to the cap per current class are considered eligible. Since there are usually more areal units with the same ranking, it may be that with the last eligible rank, there may be more suitable areal units than the cap. In this situation, we dismiss, at this step in the algorithm, the last eligible rank, and we admit all other higher ranking areal units as suitable. In our R implementation, we call this last eligible rank the edge rank. If there is no edge rank, perhaps because the cumulative count of areal units for all eligible ranks perfectly matches the cap, then we fill the template with the found areal units and move to the next class.

2.4.2. Bottom Global Rank per Subsequent Classes in the Current Class Edge Rank

If there is an edge rank, it means we did not reach the cap of needed areal units, and we consider eligible all areal units within the edge rank. The goal now is to find the areal units within the edge rank of the current class, which are least relevant for the subsequent classes. To do so, for each eligible areal unit, we calculate the top rank among all subsequent classes, called a global rank. We then consider suitable for the current class, as many areal units as needed, in descending order of the calculated global rank. After we fill these areal units as well, there are as many areal units corresponding to the current class, as dictated by the histogram of the native image, and we can move to the next class.

2.5. Closest to Mean

Ultimately, we need to obtain a downsampled raster with values from the input raster. For this, we replace class values in the histogram class-based downsampled raster with a corresponding pixel value, which is closest to the mean of all pixel values in the respective class and areal unit. This raster is returned as the output of the proposed novel downsampling method.

2.6. Multiple Bands Stack Voting

The downsampling method explained so far only applies to a single-band raster. Yet, more often than not, input rasters come in the form of multi-band rasters. Or, sometimes, multiple rasters with the same extent are used as input. If we apply the downsampling individually to each raster band, and we feed these downsampled rasters to a processing algorithm, we would most likely break the reasonable assumption that location-wise corresponding pixels are perfectly spatially overlaying. To solve this multi-band raster issue, we propose a band voting mechanism shown in Figure 2. First, each input is downsampled accordingly. For each band, the position that is selected to represent the areal unit is recorded. Finally, the most selected positions are used for pixel selection in all bands. If there are ties within an areal unit, it does not matter which value is chosen, but in our implementation, the first in R’s indexing order is chosen.

3. Validation Case Study

3.1. Is the Novel Downsampling Better?

To assess the performance of the novel downsampling method, we compare it to traditional downsampling methods, i.e., mean, and central pixel selection. Moreover, we include it in the comparison of random downsampling. By mean downsampling, we understand that the value representing an areal unit is calculated by averaging the pixel values. By the central pixel selection, we understand that the value representing an areal unit is the value of the pixel in the center of the areal unit. In the case of evenly sized areal units, we select the top left pixel of the four pixels touching the center of the areal unit. By random downsampling, we understand that the value representing an areal unit is randomly selected from the pixels covering the aggregated areal unit. We implemented random downsampling ourselves because it is not a standard downsampling method. We included it as a lower benchmark than traditional downsampling methods. A lower benchmark might reveal some insight into the performances of traditional downsampling methods as well.

Based on the theory on which the novel downsampling is built, we hypothesize that, if downsampling of a raster dataset is needed, and there is no particular knowledge on the interactions between the downsampling method, the dataset, the downsampling level, and the processing algorithm that uses the downsampled raster as input, then the output of the processing algorithm is more likely to be improved by choosing the novel downsampling method.

3.2. Experimental Design

We propose testing the hypothesis by using the four downsampling methods in conditions similar to those used by a remote sensing user. We first downsampled a dataset, then processed the downsampled version to obtain a map. The workflow can be seen in Figure 3.

The output map is compared to its respective non-downsampled version as further detailed. Non-downsampled output maps, which are the same as downsampling with a downsampling factor of 1, are the control results of the experiment.

The final comparison was done on locality, further defined. Locality aims to answer how well data are preserved at the same location. It is defined as the percentage of the area of the output raster, which is identical in value, by position, to the control output.

The experiment has one independent variable, three nuisance variables, and one dependent variable, as described in Table 1. This experimental design is more properly described as two concomitantly run sub-experiments, for each remote sensing application.

Table 1. Experimental design.

Variable	Type	Cardinality	Levels	Sampling Strategy
Downsampling method choice	Independent	4	Novel mean central random	N/A
Remote sensing application	Nuisance	2	bfast random forest	Judgment
Raster datasets	Nuisance	$2 \times 10$	Figure 4	Judgment sampling
Downsampling factors	Nuisance	9	2 to 10	Judgment sampling
Locality	Dependent	Continuous	N/A	N/A

Figure 4. Locations of datasets used as input for bfastmonitor (orange) and random forest (purple).

3.3. Nuisance Variables

When downsampling, a change in the downsampling factor, the dataset, or the remote sensing algorithm is likely to change the locality of the output maps. We did not consider these variables to be independent, but nuisances. We did not try to observe their effects, but just the effect of the downsampling method choice. By varying the downsampling factor, the dataset, and the remote sensing algorithm, we attempted to increase the variation of the dependent variables in order to increase the robustness of the experiment.

3.3.1. Downsampling Factors

We speculate that most users are interested in downsampling as little as possible or necessary, so we tested the downsampling methods at the nine different consecutive downsampling factors closest to no downsampling, further detailed. The limits of the nine downsampling factors were arbitrary and made out of convenience to limit the size of the experiment. For brevity, we further reference the 9 downsampling factors (of 2 to 10) as the 10 downsampling factors of 1 to 10. In the framework of this experiment, a downsampling factor of 1 is the same as not downsampling, resulting in control output maps.

3.3.2. Datasets

We ran each remote sensing processing algorithm sub-experiment on 10 datasets each (Figure 4). Similarly, the decision to limit ourselves to 10 datasets each was made out of convenience and to limit the size of the experiment.

The size of each dataset was chosen so that at the highest downsampling factor of 10, there would still be 100 by 100 pixels left in the dataset. The locations of these datasets were chosen only considering criteria that would make sense for non-trivial results, with respect to the chosen processing algorithm. More specifically, for bfastmonitor, we excluded all primary forest and non-forested areas, in line with a typical bfastmonitor analysis. For the random forest-based land cover classification analysis, we dismissed obviously homogeneous locations, such as deserts. The sources of all datasets were Landsat 8 scenes and Copernicus Global Land Cover Layers (Collection 2) [27] as labels.

The choice of data source had« to be usable for our chosen processing algorithms. As bfastmonitor is typically run on a time series of indexed data, such as Normalized Difference Moisture Index (NDMI), we had to choose a data source with the necessary bands and typical usage time frequency. The two most popular data sources with these specifications were Landsat and Sentinel 2. We chose Landsat 8 because its surface reflectance products are always immediately available on Google’s Earth Engine Data catalog, which was only partially true for Sentinel 2 data at the time of processing. Of course, the downsampling itself is agnostic to the differences between these two data sources, such as specific wavelengths, or how the imagery has been rectified. The downsampling method is even agnostic to the difference in resolution between Landsat 8 and Sentinel 2. From the downsampling method point of view, a factor of 1 means native scale, regardless of whether that means 10 or 30 m on the ground. The difference in resolution between data sources must be considered a change in sampling. Since we already tested across different scales, and downsampling is agnostic to any other difference, we only used Landsat 8 as the data source, as it completely covers all of our research goals.

3.3.3. Processing Algorithms

We limited the factors of the remote sensing processing algorithm to only two levels, i.e., bfastmonitor [28] and random forest [29] land cover classification, to limit the size of the experiment. In the following paragraphs, we will describe bfastmonitor for deforestation monitoring and random forest for land cover classification. Out of all possible processing algorithms, we sampled these two processing algorithms by convenience sampling. The factors we took into consideration for convenience sampling were the ease of use for the researchers performing the experiment, availability, and perceived popularity. These algorithms are typically run on disjunct datasets; we also split our experiment into two (identically set up, but technically separate) sub-experiments.

Bfastmonitor is a popular remote sensing algorithm, often used for forest monitoring. Bfastmonitor works on a time series split into a historical and a monitoring period. The historical data were used to determine a stable signal against which abrupt changes were detected in the monitoring period. We used the Python implementation [2] with default parameters, except for k (harmonics) and the

b a c k e n d

. The only deviation from default bfastmonitor parameters is that we used

k = 1

and

b a c k e n d = O p e n C L

because we appreciated that it was the more used (and faster) scenario, yet we would not expect different results from this experiment regardless of the choice of bfastmonitor parameters. From our experience, bfastmonitor parameters are incrementally tuned to fit the particularities of the dataset it is run on. The datasets used as input for bfastmonitor were the NDMI time series spanning 10 years (from 2010 to 2020). The most recent layers of these time series are presented in Figure 5.

Random forest is a classification method that outputs the most popular class selected by an ensemble of decision trees. Random forest is a popular algorithm used in the remote sensing field for land cover classification. We used the Python implementation [30] with default parameters, except for

r a n d o m_s t a t e

. We fixed

r a n d o m_s t a t e = 42

for reproducibility (and because it is a popular scenario). We used the Copernicus Global Land Cover Layers (Collection 2) [27] to train the random forest model. For each area, we randomly selected 100 points to train the model on Landsat 8 imagery (Figure 6) and Copernicus Global Land Cover Layers, as labels (Figure 7). The model was not trained on a time series of Landsat 8 imagery, but on all surface reflectance bands of a single recent Landsat 8 image, for each area, which met the following criteria, which we considered typical for the tested scenario. We chose the first cloud-free (maximum 1% cloudiness) summertime Landsat 8 scene. For summertime, we selected the months January, February, and March for the southern hemisphere, and July, August, and September for the northern hemisphere.

Both bfastmonitor and random forest had parameters that were typically fine-tuned to the target dataset before the final run. Although our intention was to test the downsampling methods as they would be used in an end-to-end production processing chain, we skipped this parameter-tuning step for two reasons. Firstly, the scale of the experiment would increase significantly. Secondly, we were not interested in the absolute accuracy of the results. Therefore, we ran the algorithms using the same parameters on each location under the assumption that the variation of our dependent variable did not depend on this fine-tuning step.

3.4. Experiment Results Analysis

We used the R stat implementation of the Wilcoxon paired signed-ranks [31] test to assess whether the localities were significantly different.

We had no knowledge about the distribution of the calculated metrics. Aliasing, or other specific arrangements, may skew the distributions of these metrics. This lack of knowledge about the distribution of the dependent variable reduces the robustness of relying on averages of scores as a means to quantify and understand the differences between the scores obtained by using each downsampling method. For the same reasons, the popular t-test would not be as reliable in the lack of normality.

The Wilcoxon paired signed-ranks test brings evidence (or disproves) that a downsampling method is more likely to yield a better result when it comes to our dependent variables. Considering that remote sensing users usually have to pick a downsampling method without much knowledge about interactions between their remote sensing algorithm, their dataset, and the downsampling factor, or method, we find this test relevant in assessing the results of this experiment. We used the traditional significance level

α = 0.05

[32], due to its ease of understanding, but also because we believe remote sensing users are not ‘interested’ in being more than 95% sure about which downsampling method to choose.

4. Results

The p values obtained by using the Wilcoxon paired signed-ranks test (see Table 2), on the novel downsampling method localities, against each alternative-tested downsampling method locality, show that the end-user is more likely to be better off when it comes to locality by choosing the novel downsampling method.

In Table 3, we can see the extra percentage of an area that is identical to the non-downsampled output maps, when using the novel downsampled method, compared to the other downsampling methods. Caution is advised when interpreting mean differences of localities (Table 3). Unlike the main results in Table 2, which we expect to stand even outside the limits of our experiment, we expect some variations for mean differences of localities, given a different setup. We still expect mean differences of localities to favor the novel method, but we expect the magnitude to vary depending on the specific setup of the experiment.

5. Discussion

Geocomputing platforms, such as Google Earth Engine [5], use downsampling as part of image pyramid building to handle large volumes of data on the fly (and at appropriate scales). Google Earth Engine follows conventional wisdom and uses mean and first value (translated central pixel selection) downsampling as default pyramiding policies for continuous and discrete-valued images, respectively. Mode, min, and max are also available as options for user-uploaded datasets. A scenario we often encounter is estimating discrete classes as fractions at a lower scale. Since it is a discrete-valued dataset, either sampling the first value, mode, min, or max would be used. The obvious solution is to calculate the fractions based on the native resolution imagery, but that is likely to be computationally expensive and it does not leverage the image’s pyramid, which is the core strength of using a geocomputing platform. Alternatively, fractions could be estimated from an image midway in the pyramid between the native resolution and the targeted scale. The midway strategy would lead to biased results under min, max, and mode towards the min, the max, and the most frequent class, respectively. The default first value sampling pyramiding policy would also be biased towards the most frequent class, but it would also have an extra error depending on aliasing. This indicates that specific use cases cannot be effectively handled with conventional downsampling methods, such as pyramiding policy. The presented novel downsampling method would allow the usage of the midway strategy with a minimal error proportional to the distance between the native and the chosen midway level in the pyramid. The novel downsampling method is also likely to improve the results for many other use cases as well, with the only minor disadvantage of additional computation costs at the data ingestion moment, which should be easily offset by all the computations done on the image dataset.

The novel downsampling method is more expensive in terms of processing. Of course, downsampling an input should only be performed when downsampling costs are marginal compared to actual processing costs. Therefore, if the user knows before processing that they are not interested in the output data distribution, then the central method downsampling might be better due to lower processing costs. If the user is unsure about which characteristics of the output they are interested in, the novel downsampling method provides similar or better scores for the locality as central, while presumably maintaining output data distribution.

The proposed novel downsampling method is just one possible solution to tackle the modifiable areal unit problem. Our method is based on a few basic principles, such as spatial autocorrelation and people’s tendencies to classify objects based on how much the basic components are alike, e.g., ‘more trees’ is a forest, and ‘more corn’ is a crop. Our work follows the suggestions from Xu et al. and provides an important step in tackling the modifiable areal unit problem, as it affects remote sensing, but we did not find an optimal solution. Since solving the modifiable areal unit problem is the same as solving representativity in representative democracies, we believe an optimal solution may not exist. Nevertheless, not all suboptimal solutions are equal and potential other avenues exist to address the modifiable areal unit problem. We envision that future research could focus on application specific downsampling.

We hope the remote sensing research community will be interested in testing our downsampling method, or other downsampling methods, in various scenarios applicable to their workflow. We recommend not focusing on how downsampling affects different data sources alone, e.g., comparing the effects of downsampling on similar products obtained from MODIS, LANDSAT, and Sentinel 2. Such an experiment would first and foremost describe the differences in resolution between the three datasets. If the goal of the experiment is to compare downsampling on datasets at different scales, we would recommend choosing a single dataset to avoid misattributing scale effects to wrong variables, such as the data provider. Instead, we suggest an experiment that compares the effects of using our downsampling method in combination with many other processing algorithms.

Further, once our downsampling method is independently validated, we suggest integrating our downsampling method with other remote sensing big data solutions, such as moving processing into the cloud [4,5]. While downsampling is already implemented in pyramid policies, a better downsampling, such as ours, would improve the quality of pyramid-based processing. We believe that global scale monitoring applications, such as deforestation monitoring [9,10,11], would improve by using our downsampling method. Our most prominent current issue (for which we desire future work to focus on) is our ‘sloppy’ research, such as implementation that is to never be fixed by the engineering community.

6. Conclusions

The novel downsampling method provides better results among tested downsampling methods at a processing cost that should be offset by processing a downsampled dataset. Considering the very low p-values, we conclude that it is extremely unlikely for the tested downsampling methods to perform better than the novel downsampling method when it comes to the percentage of unchanged areas in downsampled processed results.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, writing—original draft preparation, A.M.; writing—review and editing, A.M. and J.R.; supervision, J.R., J.V. and M.H.; project administration, J.R.; funding acquisition, J.V. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Netherlands Organization for Scientific Research (NWO) grant number 15839. The APC was also funded by NWO. The work was partly funded through the Open-Earth-Monitor Cyberinfratructure project. The Open-Earth-Monitor Cyberinfratructure has received funding from the European Union’s Horizon Europe research an innovation programme under grant agreement No. 101059548.

Data Availability Statement

We invite the remote sensing community to test the novel downsampling method published on https://gitlab.com/big-eo-analytics/mirt001/novel-downsampling-paper (accessed on 24 October 2022).

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

NDMI	Normalized Difference Moisture Index

Appendix A. Novel Downsampling Method Flowchart

Figure A1. Flowchart of the novel downsampling method part 1.

Figure A2. Flowchart of the novel downsampling method part 2.

References

von Mehren, M.; Gieseke, F.; Verbesselt, J.; Rosca, S.; Horion, S.; Zeileis, A. Massively-parallel break detection for satellite data. In Proceedings of the 30th International Conference on Scientific and Statistical Database Management, Bozen-Bolzano, Italy, 9–11 July 2018; ACM: New York, NY, USA, 2018; pp. 1–10. [Google Scholar] [CrossRef] [Green Version]
Gieseke, F.; Rosca, S.; Henriksen, T.; Verbesselt, J.; Oancea, C.E. Massively-Parallel Change Detection for Satellite Time Series Data with Missing Values. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 385–396. [Google Scholar] [CrossRef]
Pebesma, E.; Wagner, W.; Soille, P.; Kadunc, M.; Gorelick, N.; Schramm, M.; Verbesselt, J.; Reiche, J.; Appel, M.; Dries, J.; et al. OpenEO: An open API for cloud-based big Earth Observation processing platforms. In Proceedings of the EGU General Assembly Conference Abstracts, Vienna, Austria, 8–13 April 2018; p. 4957. [Google Scholar]
Schramm, M.; Pebesma, E.; Milenković, M.; Foresta, L.; Dries, J.; Jacob, A.; Wagner, W.; Mohr, M.; Neteler, M.; Kadunc, M.; et al. The openEO API–Harmonising the Use of Earth Observation Cloud Services Using Virtual Data Cube Functionalities. Remote Sens. 2021, 13, 1125. [Google Scholar] [CrossRef]
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Hamunyela, E.; Rosca, S.; Mirt, A.; Engle, E.; Herold, M.; Gieseke, F.; Verbesselt, J. Implementation of BFASTmonitor Algorithm on Google Earth Engine to Support Large-Area and Sub-Annual Change Monitoring Using Earth Observation Data. Remote Sens. 2020, 12, 2953. [Google Scholar] [CrossRef]
Lucivero, F. Big Data, Big Waste? A Reflection on the Environmental Sustainability of Big Data Initiatives. Sci. Eng. Ethics 2020, 26, 1009–1030. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Achard, F.; Hansen, M.C. Global Forest Monitoring from Earth Observation; Taylor & Francis: Abingdon, UK, 2012. [Google Scholar] [CrossRef]
Townshend, J.R.; Masek, J.G.; Huang, C.; Vermote, E.F.; Gao, F.; Channan, S.; Sexton, J.O.; Feng, M.; Narasimhan, R.; Kim, D.; et al. Global characterization and monitoring of forest cover using Landsat data: Opportunities and challenges. Int. J. Digit. Earth 2012, 5, 373–397. [Google Scholar] [CrossRef] [Green Version]
Banskota, A.; Kayastha, N.; Falkowski, M.J.; Wulder, M.A.; Froese, R.E.; White, J.C. Forest Monitoring Using Landsat Time Series Data: A Review. Can. J. Remote Sens. 2014, 40, 362–384. [Google Scholar] [CrossRef]
Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C.; Hobart, G.W.; Campbell, L.B. Mass data processing of time series Landsat imagery: Pixels to data products for forest monitoring. Int. J. Digit. Earth 2016, 9, 1035–1054. [Google Scholar] [CrossRef]
Keenan, R.J.; Reams, G.A.; Achard, F.; de Freitas, J.V.; Grainger, A.; Lindquist, E. Dynamics of global forest area: Results from the FAO Global Forest Resources Assessment 2015. For. Ecol. Manag. 2015, 352, 9–20. [Google Scholar] [CrossRef]
Lin, W.; Li, D. Adaptive downsampling to improve image compression at low bit rates. IEEE Trans. Image Process. 2006, 15, 2513–2521. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, D.; Zhang, J.; Xiong, R.; Gao, W. Interpolation-Dependent Image Downsampling. IEEE Trans. Image Process. 2011, 20, 3291–3296. [Google Scholar] [CrossRef]
Youssef, A. Analysis and comparison of various image downsampling and upsampling methods. In Proceedings of the DCC ’98 Data Compression Conference (cat. No.98TB100225), Snowbird, UT, USA, 30 March–1 April 1998; p. 583. [Google Scholar] [CrossRef]
Dumitrescu, D.; Boiangiu, C.A. A Study of Image Upsampling and Downsampling Filters. Computers 2019, 8, 30. [Google Scholar] [CrossRef] [Green Version]
Merhav, N.; Bhaskaran, V. Fast algorithms for DCT-domain image downsampling and for inverse motion compensation. IEEE Trans. Circuits Syst. Video Technol. 1997, 7, 468–476. [Google Scholar] [CrossRef]
Xu, Y.; Jin, Z. Down-Sampling Face Images and Low-Resolution Face Recognition. In Proceedings of the 2008 3rd International Conference on Innovative Computing Information and Control, Dalian, China, 18–20 June 2008; IEEE: Piscataway, NJ, USA, 2008; p. 392. [Google Scholar] [CrossRef]
Vitulli, R.; Del Bello, U.; Armbruster, P.; Baronti, S.; Santurti, L. Aliasing effects mitigation by optimised sampling grids and impact on image acquisition chains. In Proceedings of the IEEE International Geoscience and Remote Sensing Symposium, Toronto, ON, Canada, 24–28 June 2002; IEEE: Piscataway, NJ, USA, 2002; Volume 2, pp. 979–981. [Google Scholar] [CrossRef]
Schowengerdt, R.A. Aliasing in remote sensing imagery. In Proceedings of the Visual Information Processing XI, Orlando, FL, USA, 30 July 2002; Rahman, Z.U., Schowengerdt, R.A., Reichenbach, S.E., Eds.; SPIE: Bellingham, WA, USA, 2002; Volume 4736, pp. 92–98. [Google Scholar] [CrossRef]
Weigel, S. Scale, Resolution and Resampling: Representation and Analysis of Remotely Sensed Landscapes Across Scale in Geographic Information Systems. Ph.D. Thesis, Louisiana State University and Agricultural & Mechanical College, Baton Rouge, LA, USA, 1996. [Google Scholar]
Raj, R.; Hamm, N.A.; Kant, Y. Analysing the effect of different aggregation approaches on remotely sensed data. Int. J. Remote Sens. 2013, 34, 4900–4916. [Google Scholar] [CrossRef]
Xu, K.; Tian, Q.; Yang, Y.; Yue, J.; Tang, S. How up-scaling of remote-sensing images affects land-cover classification by comparison with multiscale satellite images. Int. J. Remote Sens. 2019, 40, 2784–2810. [Google Scholar] [CrossRef]
Kitron, U.; Clennon, J.A.; Cecere, M.C.; Gürtler, R.E.; King, C.H.; Vazquez-Prokopec, G. Upscale or downscale: Applications of fine scale remotely sensed data to Chagas disease in Argentina and schistosomiasis in Kenya. Geospat. Health 2006, 1, 49–58. [Google Scholar] [CrossRef] [Green Version]
Wu, H.; Li, Z.L. Scale Issues in Remote Sensing: A Review on Analysis, Processing and Modeling. Sensors 2009, 9, 1768–1793. [Google Scholar] [CrossRef] [PubMed]
Openshaw, S. The Modifiable Areal Unit Problem; Geo Books: Norwick, UK, 1983. [Google Scholar]
Buchhorn, M.; Lesiv, M.; Tsendbazar, N.E.; Herold, M.; Bertels, L.; Smets, B. Copernicus Global Land Cover Layers—Collection 2. Remote Sens. 2020, 12, 1044. [Google Scholar] [CrossRef] [Green Version]
Verbesselt, J.; Zeileis, A.; Herold, M. Near real-time disturbance detection using satellite image time series. Remote Sens. Environ. 2012, 123, 98–108. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-Learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing. Manual; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
Fisher, R.A. The arrangement of field experiments. J. Minist. Agric. 1926, 33, 503–515. [Google Scholar] [CrossRef]

Figure 1. Overview of the novel downsampling method. The steps are further detailed in the text. A more detailed version of this chart is presented in Appendix A flowchart, split into Figure A1 and Figure A2.

Figure 2. From each band on the left, a position is selected (star). The most voted position (two stars) (center) is the one chosen to represent each band (right).

Figure 3. The treatment of the experiment, which resembles a typical remote sensing analysis workflow that requires downsampling.

Figure 5. The most recent layer of time series datasets used as input for bfastmonitor processing algorithm. Titles are toponyms.

Figure 6. Target datasets used as input for the random forest land cover processing algorithm. One hundred points were sampled on each dataset. Titles are toponyms.

Figure 7. Copernicus land cover used as training labels for the random forest land cover processing algorithm. One hundred points were sampled on each dataset. Titles are toponyms.

Table 2. Wilcoxon paired signed-ranks test p values lower than the significance level

α = 0.05

show that the novel downsampling method is consistently better than the alternatives. The p-values marked as 0 are approximations up to the ninth decimal.

Table 2. Wilcoxon paired signed-ranks test p values lower than the significance level

α = 0.05

show that the novel downsampling method is consistently better than the alternatives. The p-values marked as 0 are approximations up to the ninth decimal.

Novel vs.	Mean	Central	Random
bfastmonitor	0	0	0
random forest	0.007	0	0

Table 3. Results of mean differences between localities when using the novel downsampling method versus the other downsampling methods. Positive values show the advantage of the novel downsampling method.

Novel vs.	Mean (%)	Central (%)	Random (%)
bfastmonitor	8.0	1.4	4.6
random forest	0.8	1.4	5.7

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mîrț, A.; Reiche, J.; Verbesselt, J.; Herold, M. A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing. Remote Sens. 2022, 14, 5538. https://doi.org/10.3390/rs14215538

AMA Style

Mîrț A, Reiche J, Verbesselt J, Herold M. A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing. Remote Sensing. 2022; 14(21):5538. https://doi.org/10.3390/rs14215538

Chicago/Turabian Style

Mîrț, Andrei, Johannes Reiche, Jan Verbesselt, and Martin Herold. 2022. "A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing" Remote Sensing 14, no. 21: 5538. https://doi.org/10.3390/rs14215538

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Downsampling Method Addressing the Modifiable Areal Unit Problem in Remote Sensing

Abstract

1. Introduction and Background

2. Methodology

2.1. Histogram-Based Classification

2.2. Histogram Class Frequency per Areal Unit

2.3. Ranked Areal Units per Class

2.4. Iterative Class Search for Areal Units

2.4.1. Top Ranking per Class Rule

2.4.2. Bottom Global Rank per Subsequent Classes in the Current Class Edge Rank

2.5. Closest to Mean

2.6. Multiple Bands Stack Voting

3. Validation Case Study

3.1. Is the Novel Downsampling Better?

3.2. Experimental Design

3.3. Nuisance Variables

3.3.1. Downsampling Factors

3.3.2. Datasets

3.3.3. Processing Algorithms

3.4. Experiment Results Analysis

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Novel Downsampling Method Flowchart

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI