In pixel-based classification, individual image pixels are analysed by the spectral information that they contain (Richards, 1993). This is the traditional approach to classification since the pixel is the fundamental (spatial) unit of a satellite image, and consequently it comes naturally and is often easy to implement. Various schemes are in use in pixel-based classification. Maximum-likelihood, Minimum-distance-to-mean, and Minimum-Mahalanobis-distance are three of these. One assumes that the classes have been statistically characterized, so, for instance, their means and variances are known. The three schemes all use some notion of “distance” to the mean of the class to decide which class to assign pixels.
Ideally, in pixel-based classification one uses class characterizations that are well-defined and well-separated, but reality may not always provide these. In land cover studies, some classes will allow definitions that are stable and constant (water bodies, roads, rooftops come to mind) but in the details, problems arise. Rooftops, car parks and roads can all be constructed from asphalt and so if these three need to be separated as classes, problems may arise. Vegetation cover, and certainly annual crop cover, brings extra problems as it evolves over time and is much harder to characterize without bringing in the time dimension.
A fundamental limitation of pixel-based classification is that information from surrounding pixels, which may help in correctly identifying the target pixel’s class, is not used. As a consequence, a class that displays high spectral heterogeneity may have its pixels labelled as different classes. This salt-and-pepper effect is clearly unwanted. For example, when farm fields exhibit high within-field spectral variability owing to variations in soil fertility, soil moisture conditions, pests and diseases, or erratic farm practices (De Wit and Clevers, 2004; Forkuor et al., 2014; Peña-Barragán et al., 2011), this approach could lead to a high rate of misclassification, where parts of a field in some class is incorrectly classified as another class. Such misclassifications are even pronounced when the approach is applied to highly heterogeneous landscapes as we witness in smallholder agriculture (Tran et al., 2014; Xiao et al., 2002). At the same time, the use of high spatial resolution imagery, which is required to adequately capture small agricultural fields, leads to increased errors when per-pixel approaches are adopted (Myint, 2006; Myint et al., 2011). Where possible, object-based approaches are often implemented to overcome these challenges.
Another limitation of pixel-based approaches is the problem of mixed pixels. High temporal resolution images (e.g. MODIS, AVHRR) are often required for improved crop discrimination due to their ability to detect the distinct temporal profiles or phenological cycles of crops (Conrad et al., 2011; Wardlow et al., 2007, 2006). But the spatial resolution of such images (e.g., 250 m for MODIS) are too coarse for smallholder-dominant agricultural regions where typical farm sizes may not even exceed the size of a pixel’s footprint. In these cases, we face the problem of mixed pixels, in which the signal from multiple land covers is present in single pixels (Smith et al. 2003, Jung et al. 2006, Husak et al. 2008). One of the approaches developed to overcome this challenge is sub-pixel classification, through which the proportion of different land covers within a pixel is determined.
These challenges notwithstanding, per-pixel classification approaches remain the most widely used in the remote sensing community.