Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Pankaj Dey

Pages: [1] 2 3 ... 33
Catchment-scale hydrological models are widely used to represent and improve our understanding of hydrological processes, and to support operational water resources management. Conceptual models, where catchment dynamics are approximated using relatively simple storage and routing elements, offer an attractive compromise in terms of predictive accuracy, computational demands and amenability to interpretation. This paper introduces SuperflexPy, an open-source Python framework implementing the SUPERFLEX principles (Fenicia et al., 2011) for building conceptual hydrological models from generic components, with a high degree of control over all aspects of model specification. SuperflexPy can be used to build models of a wide range of spatial complexity, ranging from simple lumped models (e.g. a reservoir) to spatially distributed configurations (e.g. nested sub-catchments), with the ability to customize all individual model elements. SuperflexPy is a Python package, enabling modelers to exploit the full potential of the framework without the need for separate software installations, and making it easier to use and interface with existing Python code for model deployment. This paper presents the general architecture of SuperflexPy, and illustrates its usage to build conceptual models of varying degrees of complexity. The illustration includes the usage of existing SuperflexPy model elements, as well as their extension to implement new functionality. SuperflexPy is available as open-source code, and can be used by the hydrological community to investigate improved process representations, for model comparison, and for operational work. A comprehensive documentation is available online and provided as supplementary material to this paper.

Performance criteria are essential for hydrological model identification or its parameters estimation. The Kling-Gupta efficiency (KGE), which combines the three components of Nash-Sutcliffe efficiency (NSE) of model errors (i.e. correlation, bias, ratio of variances or coefficients of variation) in a more balanced way, has been widely used for calibration and evaluation hydrological models in recent years. However, the KGE does not take a reference forecasts or simulation into account and still underestimates of variability of flow time series when optimizing its value for hydrological model. In this study, we propose another performance criterion as an efficiency measure through reformulating the previous three components of NSE. Moreover, the distribution function of the new criterion was also derived to analyze uncertainties of the new criterion, which is originated from the distinction between the theoretical or population statistic and its corresponding sampling properties. The proposed criterion was tested by calibrating the “abcd” and XAJ hourly hydrological models at monthly and hourly time scales data for two different case study basins. Evaluation of the results of the case study clearly demonstrates the overall better or comparable model performances from the proposed criterion. The analysis of the uncertainties of the new criterion based on its distribution probability function suggests a rational approach to distinguish between the probabilistic properties and behavior of the theoretical statistics and the rather different sampling properties of estimators of those statistics when computed from data.

Interesting information / Unanswered questions on the Budyko framework
« on: November 05, 2020, 09:17:04 AM »
Landscapes and their hydrology are complex and sui generis. As a result, few theories exist that (without calibration) usefully describe or predict catchment-scale hydrological behavior [Beven, 2000; Sivapalan, 2005]. The Budyko hypothesis [Budyko, 1951; 1974] is a rare exception: its simple parameterization of how aridity (the ratio of long-term mean precipitation to long-term mean potential evapotranspiration) controls the long-term mean partitioning of precipitation into streamflow and evapotranspiration captures the behavior of many catchments around the world. In recent years, the Budyko framework has increasingly being used to interpret and predict (often non-stationary) water balances. While uses of the framework have become diverse and widespread, they are typically founded in common principles that rely on largely untested assumptions and strongly relate to questions for which no clear answers exist. Therefore, we believe that answering several basic questions around the Budyko framework can strengthen (or invalidate) many old, recent, and future applications. We realize that similar questions have been contemplated by others, but we hope that presenting them in the following manner may prove useful.

Soil moisture observations are of broad scientific interest and practical value for a wide range of applications. The scientific community has made significant progress in estimating soil moisture from satellite-based Earth observation data, particularly in operationalizing coarse-resolution (25-50 km) soil moisture products. This review summarizes existing applications of satellite-derived soil moisture products and identifies gaps between the characteristics of currently available soil moisture products and the application requirements from various disciplines. We discuss the efforts devoted to the generation of high-resolution soil moisture products from satellite Synthetic Aperture Radar (SAR) data such as Sentinel-1 C-band backscatter observations and/or through downscaling of existing coarse-resolution microwave soil moisture products. Open issues and future opportunities of satellite-derived soil moisture are discussed, providing guidance for further development of operational soil moisture products and bridging the gap between the soil moisture user and supplier communities.

Runoff prediction in ungauged and scarcely gauged catchments is a key research field in surface water hydrology. There have been numerous studies before and since the launch of the predictions in ungauged basins (PUB) initiative by the International Association of Hydrological Sciences in 2003. This study critically reviews and assesses the decadal progress in the regionalization of hydrological modeling, which is the major tool for PUB, from 2000 to 2019. This paper found that the journal publications have noticeably increased in terms of PUB in the past 7 years, and research countries have been expanded dramatically since 2013. The regionalization methods are grouped into three categories including similarity‐based, regression‐based, and hydrological signature‐based. There are more detailed researches focusing on the interdisciplinary and profound improvement of each regionalization method. Namely, tremendous efforts have been made and lots of improvements have been carried out in the parameterization domain for the post‐PUB period. However, there is still plenty of room to improve the prediction capability in data‐sparse regions (e.g., further verification and proof of multi‐modeling adaptation and uncertainties description). This paper also discusses possible research directions in the future, including PUB in a changing environment and better utilization of multi‐source remote‐sensing information.

IMDLIB is a python package to download and handle binary grided data from India Meteorological Department (IMD). For more information about the IMD datasets, follow the following link:
Link to tutorial:

This is a source code repository for the Framework for Understanding Structural Errors or FUSE. FUSE is modular modelling framework which enables the generation of a myriad of conceptual hydrological models by recombining elements from commonly-used models. Running a hydrological model means making a wide range of decisions, which will influence the simulations in different ways and to different extents. Our goal with FUSE is enable users to be in charge of these decisions, so that they can understand their effects, and thereby, develop and use better models.
FUSE was build from scratch to be modular, it offers several options for each important modelling decision and enables the addition of new modules. In contrast, most traditional hydrological models rely on a single model structure (most processes are simulated by a single set of equations). FUSE modularity makes it easier to i) understand differences between models, ii) run a large ensemble of models, iii) capture the spatial variability of hydrological processes and iv) develop and improve hydrological models in a coordinated fashion across the community.
 New features FUSE initial implementation (FUSE1) is described in Clark et al. (WRR, 2008). The implementation provided here (which will become FUSE2) was created with users in mind and significantly increases the usability and range of applicability of the original version. In particular, it involves 5 main additional features:
  • an interface enabling the use of the different FUSE modes (default, calibration, regionalisation),
  • a distributed mode enabling FUSE to run on a grid whilst efficiently managing memory,
  • all the input, output and parameter files are now NetCDF files to improve reproducibility,
  • a calibration mode based on the shuffled complex evolution algorithm (Duan et al., WRR, 1992),
  • a snow module described in Henn et al. (WRR, 2015).
Manual Instructions to compile the code provided in this repository and to run FUSE are provided in FUSE manual.
 License FUSE is distributed under the GNU Public License Version 3. For details see the file LICENSE in the FUSE root directory or visit the online version.

Abstract. Over past decades, a lot of global land-cover products have been released, however, these is still lack of a global land-cover map with fine classification system and spatial resolution simultaneously. In this study, a novel global 30-m land-cover classification with a fine classification system for the year 2015 (GLC_FCS30-2015) was produced by combining time-series of Landsat imagery and high-quality training data from the GSPECLib (Global Spatial Temporal Spectra Library) on the Google Earth Engine computing platform. First, the global training data from the GSPECLib were developed by applying a series of rigorous filters to the MCD43A4 NBAR and CCI_LC land-cover products. Secondly, a local adaptive random forest model was built for each 5° × 5° geographical tile by using the multi-temporal Landsat spectral and textures features of the corresponding training data, and the GLC_FCS30-2015 land-cover product containing 30 land-cover types was generated for each tile. Lastly, the GLC_FCS30-2015 was validated using three different validation systems (containing different land-cover details) using 44 043 validation samples. The validation results indicated that the GLC_FCS30-2015 achieved an overall accuracy of 82.5 % and a kappa coefficient of 0.784 for the level-0 validation system (9 basic land-cover types), an overall accuracy of 71.4 % and kappa coefficient of 0.686 for the UN-LCCS (United Nations Land Cover Classification System) level-1 system (16 LCCS land-cover types), and an overall accuracy of 68.7 % and kappa coefficient of 0.662 for the UN-LCCS level-2 system (24 fine land-cover types). The comparisons against other land-cover products (CCI_LC, MCD12Q1, FROM_GLC and GlobeLand30) indicated that GLC_FCS30-2015 provides more spatial details than CCI_LC-2015 and MCD12Q1-2015 and a greater diversity of land-cover types than FROM_GLC-2015 and GlobeLand30-2010, and that GLC_FCS30-2015 achieved the best overall accuracy of 82.5% against FROM_GLC-2015 of 59.1 % and GlobeLand30-2010 of 75.9 %. Therefore, it is concluded that the GLC_FCS30-2015 product is the first global land-cover dataset that provides a fine classification system with high classification accuracy at 30 m. The GLC_FCS30-2015 global land-cover products generated in this paper is available at (Liu et al., 2020).

Interesting information / Effective Computing
« on: September 16, 2020, 01:01:10 PM »
Overwhelmed by the world of computing tools you could be using for your research? Mired in messy code that won't evolve with your ideas? This course is for you. Designed for grad students across the College of the Environment, it is a broad, practical introduction to the most important computer things you need to know to keep your research flowing smoothly.

Interesting information / Introducing the R Package “biascorrection”
« on: September 15, 2020, 05:40:34 PM »
For variety of reasons, we need hydrological models for our short- and long-term predictions and planning.  However, it is no secret that these models always suffer from some degree of bias. This bias can stem from many different and often interacting sources. Some examples are biases in underlying model assumptions, missing processes, model parameters, calibration parameters, and imperfections in input data (Beven and Binley, 1992).
The question of how to use models, given all these uncertainties, has been an active area of research for at least 50 years and will probably remain so for the foreseeable future, but going through that is not the focus of this blog post.
In this post, I explain a technique called bias correction that is frequently used in an attempt to improve model predictions. I also introduce an R package for bias correction that I recently developed; the package is called “biascorrection.” Although most of the examples in this post are about hydrological models, the arguments and the R package might be useful in other disciplines, for example with atmospheric models that have been one of the hotspots of bias correction applications (for example, herehere and here). The reason is that the algorithm follows a series of simple mathematical procedures that can be applied to other questions and research areas.

Link to the blog post:

We develop a Bayesian Land Surface Phenology (LSP) model and examine its performance using Enhanced Vegetation Index (EVI) observations derived from the Harmonized Landsat Sentinel-2 (HLS) dataset. Building on previous work, we propose a double logistic function that, once couched within a Bayesian model, yields posterior distributions for all LSP parameters. We assess the efficacy of the Normal, Truncated Normal, and Beta likelihoods to deliver robust LSP parameter estimates. Two case studies are presented and used to explore aspects of the proposed model. The first, conducted over forested pixels within a HLS tile, explores choice of likelihood and space-time varying HLS data availability for long-term average LSP parameter point and uncertainty estimation. The second, conducted on a small area of interest within the HLS tile on an annual time-step, further examines the impact of sample size and choice of likelihood on LSP parameter estimates. Results indicate that while the Truncated Normal and Beta likelihoods are theoretically preferable when the vegetation index is bounded, all three likelihoods performed similarly when the number of index observations is sufficiently large and values are not near the index bounds. Both case studies demonstrate how pixel-level LSP parameter posterior distributions can be used to propagate uncertainty through subsequent analysis. As a companion to this article, we provide an open-source \R package \pkg{rsBayes} and supplementary data and code used to reproduce the analysis results. The proposed model specification and software implementation delivers computationally efficient, statistically robust, and inferentially rich LSP parameter posterior distributions at the pixel-level across massive raster time series datasets.
Link to R Package manual :

Interesting information / Workshop : Uncertainties in data analysis
« on: September 04, 2020, 10:42:45 AM »
The 3-day interdisciplinary workshop on "Uncertainties in data analysis" will be held at the Potsdam Institute for Climate Impact Research (PIK), Germany, during 30 Sep and 2 Oct 2020. The workshop will consist of 5 lectures and hands-on tutorials conducted by experts from different applied fields that deal with uncertainties. The sessions are primarily intended for postgraduate, doctoral and post-doctoral researchers. Attendees are invited to contribute to the workshop with presentations of their own research, covering uncertainties or challenging investigations in (palaeo-)climate research.The tutorials will cover more general and some specific topics, such as applied Bayesian statistics, palaeoclimate age uncertainties, ice core uncertainties, nonlinear time series analysis, and how uncertainties in data can be modeled in a theoretical or applied sense.

Interesting information / varrank: a variable selection appoach
« on: August 25, 2020, 11:52:55 AM »
A common challenge encountered when working with high dimensional datasets is that of variable selection. All relevant confounders must be taken into account to allow for unbiased estimation of model parameters, while balancing with the need for parsimony and producing interpretable models. This task is known to be one of the most controversial and difficult tasks in epidemiological analysis.
Variable selection approaches can be categorized into three broad classes: filter-based methods, wrapper-based methods, and embedded methods. They differ in how the methods combine the selection step and the model inference. An appealing filter approach is the minimum redundancy maximum relevance (mRMRe) algorithm. The purpose of this heuristic approach is to select the most relevant variables from a set by penalising according to the amount of redundancy variables share with previously selected variables. In epidemiology, the most frequently used approaches to tackle variable selection based on modeling use goodness-of fit metrics. The paradigm is that important variables for modeling are variables that are causally connected and predictive power is a proxy for causal links. On the other hand, the mRMRe algorithm aims to measure the importance of variables based on a relevance penalized by redundancy measure which makes it appealing for epidemiological modeling.
varrank has a flexible implementation of the mRMRe algorithm which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.


Global future land use (LU) is an important input for Earth system models for projecting Earth system dynamics and is critical for many modeling studies on future global change. Here we generated a new global gridded LU dataset using the Global Change Analysis Model (GCAM) and a land use spatial downscaling model, named Demeter, under the five Shared Socioeconomic Pathways (SSPs) and four Representative Concentration Pathways (RCPs) scenarios. Compared to existing similar datasets, the presented dataset has a higher spatial resolution (0.05°×0.05°) and spreads under a more comprehensive set of SSP-RCP scenarios (in total 15 scenarios), and considers uncertainties from the forcing climates. We compared our dataset with the Land Use Harmonization version 2 (LUH2) dataset and found our results are in general spatially consistent with LUH2. The presented dataset will be useful for global Earth system modeling studies, especially for the analysis of the impacts of land use and land cover change and socioeconomics, as well as the characterizing the uncertainties associated with these impacts.
Link to dataset:

A perception has emerged, based on several studies, that satellite-based reflectances are limited in terms of their ability to predict gross primary production (GPP) globally at weekly temporal scales. The basis for this inference is in part that reflectances, particularly expressed in the form of vegetation indices (VIs), convey information about potential rather than actual photosynthesis, and they are sensitive to non-green substances (e.g., soil, woody branches, and snow) as well as to chlorophyll. Previous works have suggested that processing and quality control of satellite-based reflectance data play an important role in their interpretation. In this study, we use high quality reflectance data from the MODerate-resolution Imaging Spectroradiometer (MODIS) data to train neural networks that are used to upscale GPP estimated from eddy covariance flux tower measurements globally. We quantify the ability of the machine learning approaches to capture GPP variability at daily to interannual time scales. Our results show that MODIS reflectances, when paired only with potential short-wave radiation, are able to capture a large fraction of GPP variability (approximately 77%) at daily to weekly time scales. Additional meteorological information (temperature, water vapor deficit, soil water content, ET, and incident radiation) captures only a few more percent of the GPP variability. The meteorological information is used most effectively when information about plant functional type and climate classification is included. We show that machine learning can be a useful tool for estimating GPP uncertainties as well as GPP itself from upscaling methods. Our estimated global annual mean GPP for 2007 is 142.5 ± 7.7 Pg C y which is higher than some other satellite-based estimates but within the range of other reported observation-, model-, and hybrid-based values.

Pages: [1] 2 3 ... 33