Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Messages - Pankaj Dey

Pages: [1] 2 3 ... 33
IMDLIB is a python package to download and handle binary grided data from India Meteorological Department (IMD). For more information about the IMD datasets, follow the following link:
Link to tutorial:

This is a source code repository for the Framework for Understanding Structural Errors or FUSE. FUSE is modular modelling framework which enables the generation of a myriad of conceptual hydrological models by recombining elements from commonly-used models. Running a hydrological model means making a wide range of decisions, which will influence the simulations in different ways and to different extents. Our goal with FUSE is enable users to be in charge of these decisions, so that they can understand their effects, and thereby, develop and use better models.
FUSE was build from scratch to be modular, it offers several options for each important modelling decision and enables the addition of new modules. In contrast, most traditional hydrological models rely on a single model structure (most processes are simulated by a single set of equations). FUSE modularity makes it easier to i) understand differences between models, ii) run a large ensemble of models, iii) capture the spatial variability of hydrological processes and iv) develop and improve hydrological models in a coordinated fashion across the community.
 New features FUSE initial implementation (FUSE1) is described in Clark et al. (WRR, 2008). The implementation provided here (which will become FUSE2) was created with users in mind and significantly increases the usability and range of applicability of the original version. In particular, it involves 5 main additional features:
  • an interface enabling the use of the different FUSE modes (default, calibration, regionalisation),
  • a distributed mode enabling FUSE to run on a grid whilst efficiently managing memory,
  • all the input, output and parameter files are now NetCDF files to improve reproducibility,
  • a calibration mode based on the shuffled complex evolution algorithm (Duan et al., WRR, 1992),
  • a snow module described in Henn et al. (WRR, 2015).
Manual Instructions to compile the code provided in this repository and to run FUSE are provided in FUSE manual.
 License FUSE is distributed under the GNU Public License Version 3. For details see the file LICENSE in the FUSE root directory or visit the online version.

Abstract. Over past decades, a lot of global land-cover products have been released, however, these is still lack of a global land-cover map with fine classification system and spatial resolution simultaneously. In this study, a novel global 30-m land-cover classification with a fine classification system for the year 2015 (GLC_FCS30-2015) was produced by combining time-series of Landsat imagery and high-quality training data from the GSPECLib (Global Spatial Temporal Spectra Library) on the Google Earth Engine computing platform. First, the global training data from the GSPECLib were developed by applying a series of rigorous filters to the MCD43A4 NBAR and CCI_LC land-cover products. Secondly, a local adaptive random forest model was built for each 5° × 5° geographical tile by using the multi-temporal Landsat spectral and textures features of the corresponding training data, and the GLC_FCS30-2015 land-cover product containing 30 land-cover types was generated for each tile. Lastly, the GLC_FCS30-2015 was validated using three different validation systems (containing different land-cover details) using 44 043 validation samples. The validation results indicated that the GLC_FCS30-2015 achieved an overall accuracy of 82.5 % and a kappa coefficient of 0.784 for the level-0 validation system (9 basic land-cover types), an overall accuracy of 71.4 % and kappa coefficient of 0.686 for the UN-LCCS (United Nations Land Cover Classification System) level-1 system (16 LCCS land-cover types), and an overall accuracy of 68.7 % and kappa coefficient of 0.662 for the UN-LCCS level-2 system (24 fine land-cover types). The comparisons against other land-cover products (CCI_LC, MCD12Q1, FROM_GLC and GlobeLand30) indicated that GLC_FCS30-2015 provides more spatial details than CCI_LC-2015 and MCD12Q1-2015 and a greater diversity of land-cover types than FROM_GLC-2015 and GlobeLand30-2010, and that GLC_FCS30-2015 achieved the best overall accuracy of 82.5% against FROM_GLC-2015 of 59.1 % and GlobeLand30-2010 of 75.9 %. Therefore, it is concluded that the GLC_FCS30-2015 product is the first global land-cover dataset that provides a fine classification system with high classification accuracy at 30 m. The GLC_FCS30-2015 global land-cover products generated in this paper is available at (Liu et al., 2020).

Interesting information / Effective Computing
« on: September 16, 2020, 01:01:10 PM »
Overwhelmed by the world of computing tools you could be using for your research? Mired in messy code that won't evolve with your ideas? This course is for you. Designed for grad students across the College of the Environment, it is a broad, practical introduction to the most important computer things you need to know to keep your research flowing smoothly.

Interesting information / Introducing the R Package “biascorrection”
« on: September 15, 2020, 05:40:34 PM »
For variety of reasons, we need hydrological models for our short- and long-term predictions and planning.  However, it is no secret that these models always suffer from some degree of bias. This bias can stem from many different and often interacting sources. Some examples are biases in underlying model assumptions, missing processes, model parameters, calibration parameters, and imperfections in input data (Beven and Binley, 1992).
The question of how to use models, given all these uncertainties, has been an active area of research for at least 50 years and will probably remain so for the foreseeable future, but going through that is not the focus of this blog post.
In this post, I explain a technique called bias correction that is frequently used in an attempt to improve model predictions. I also introduce an R package for bias correction that I recently developed; the package is called “biascorrection.” Although most of the examples in this post are about hydrological models, the arguments and the R package might be useful in other disciplines, for example with atmospheric models that have been one of the hotspots of bias correction applications (for example, herehere and here). The reason is that the algorithm follows a series of simple mathematical procedures that can be applied to other questions and research areas.

Link to the blog post:

We develop a Bayesian Land Surface Phenology (LSP) model and examine its performance using Enhanced Vegetation Index (EVI) observations derived from the Harmonized Landsat Sentinel-2 (HLS) dataset. Building on previous work, we propose a double logistic function that, once couched within a Bayesian model, yields posterior distributions for all LSP parameters. We assess the efficacy of the Normal, Truncated Normal, and Beta likelihoods to deliver robust LSP parameter estimates. Two case studies are presented and used to explore aspects of the proposed model. The first, conducted over forested pixels within a HLS tile, explores choice of likelihood and space-time varying HLS data availability for long-term average LSP parameter point and uncertainty estimation. The second, conducted on a small area of interest within the HLS tile on an annual time-step, further examines the impact of sample size and choice of likelihood on LSP parameter estimates. Results indicate that while the Truncated Normal and Beta likelihoods are theoretically preferable when the vegetation index is bounded, all three likelihoods performed similarly when the number of index observations is sufficiently large and values are not near the index bounds. Both case studies demonstrate how pixel-level LSP parameter posterior distributions can be used to propagate uncertainty through subsequent analysis. As a companion to this article, we provide an open-source \R package \pkg{rsBayes} and supplementary data and code used to reproduce the analysis results. The proposed model specification and software implementation delivers computationally efficient, statistically robust, and inferentially rich LSP parameter posterior distributions at the pixel-level across massive raster time series datasets.
Link to R Package manual :

Interesting information / Workshop : Uncertainties in data analysis
« on: September 04, 2020, 10:42:45 AM »
The 3-day interdisciplinary workshop on "Uncertainties in data analysis" will be held at the Potsdam Institute for Climate Impact Research (PIK), Germany, during 30 Sep and 2 Oct 2020. The workshop will consist of 5 lectures and hands-on tutorials conducted by experts from different applied fields that deal with uncertainties. The sessions are primarily intended for postgraduate, doctoral and post-doctoral researchers. Attendees are invited to contribute to the workshop with presentations of their own research, covering uncertainties or challenging investigations in (palaeo-)climate research.The tutorials will cover more general and some specific topics, such as applied Bayesian statistics, palaeoclimate age uncertainties, ice core uncertainties, nonlinear time series analysis, and how uncertainties in data can be modeled in a theoretical or applied sense.

Interesting information / varrank: a variable selection appoach
« on: August 25, 2020, 11:52:55 AM »
A common challenge encountered when working with high dimensional datasets is that of variable selection. All relevant confounders must be taken into account to allow for unbiased estimation of model parameters, while balancing with the need for parsimony and producing interpretable models. This task is known to be one of the most controversial and difficult tasks in epidemiological analysis.
Variable selection approaches can be categorized into three broad classes: filter-based methods, wrapper-based methods, and embedded methods. They differ in how the methods combine the selection step and the model inference. An appealing filter approach is the minimum redundancy maximum relevance (mRMRe) algorithm. The purpose of this heuristic approach is to select the most relevant variables from a set by penalising according to the amount of redundancy variables share with previously selected variables. In epidemiology, the most frequently used approaches to tackle variable selection based on modeling use goodness-of fit metrics. The paradigm is that important variables for modeling are variables that are causally connected and predictive power is a proxy for causal links. On the other hand, the mRMRe algorithm aims to measure the importance of variables based on a relevance penalized by redundancy measure which makes it appealing for epidemiological modeling.
varrank has a flexible implementation of the mRMRe algorithm which perform variable ranking based on mutual information. The package is particularly suitable for exploring multivariate datasets requiring a holistic analysis. The two main problems that can be addressed by this package are the selection of the most representative variables for modeling a collection of variables of interest, i.e., dimension reduction, and variable ranking with respect to a set of variables of interest.


Global future land use (LU) is an important input for Earth system models for projecting Earth system dynamics and is critical for many modeling studies on future global change. Here we generated a new global gridded LU dataset using the Global Change Analysis Model (GCAM) and a land use spatial downscaling model, named Demeter, under the five Shared Socioeconomic Pathways (SSPs) and four Representative Concentration Pathways (RCPs) scenarios. Compared to existing similar datasets, the presented dataset has a higher spatial resolution (0.05°×0.05°) and spreads under a more comprehensive set of SSP-RCP scenarios (in total 15 scenarios), and considers uncertainties from the forcing climates. We compared our dataset with the Land Use Harmonization version 2 (LUH2) dataset and found our results are in general spatially consistent with LUH2. The presented dataset will be useful for global Earth system modeling studies, especially for the analysis of the impacts of land use and land cover change and socioeconomics, as well as the characterizing the uncertainties associated with these impacts.
Link to dataset:

A perception has emerged, based on several studies, that satellite-based reflectances are limited in terms of their ability to predict gross primary production (GPP) globally at weekly temporal scales. The basis for this inference is in part that reflectances, particularly expressed in the form of vegetation indices (VIs), convey information about potential rather than actual photosynthesis, and they are sensitive to non-green substances (e.g., soil, woody branches, and snow) as well as to chlorophyll. Previous works have suggested that processing and quality control of satellite-based reflectance data play an important role in their interpretation. In this study, we use high quality reflectance data from the MODerate-resolution Imaging Spectroradiometer (MODIS) data to train neural networks that are used to upscale GPP estimated from eddy covariance flux tower measurements globally. We quantify the ability of the machine learning approaches to capture GPP variability at daily to interannual time scales. Our results show that MODIS reflectances, when paired only with potential short-wave radiation, are able to capture a large fraction of GPP variability (approximately 77%) at daily to weekly time scales. Additional meteorological information (temperature, water vapor deficit, soil water content, ET, and incident radiation) captures only a few more percent of the GPP variability. The meteorological information is used most effectively when information about plant functional type and climate classification is included. We show that machine learning can be a useful tool for estimating GPP uncertainties as well as GPP itself from upscaling methods. Our estimated global annual mean GPP for 2007 is 142.5 ± 7.7 Pg C y which is higher than some other satellite-based estimates but within the range of other reported observation-, model-, and hybrid-based values.

An unprecedented spectroscopic data stream will soon become available with forthcoming Earth-observing satellite missions equipped with imaging spectroradiometers. This data stream will open up a vast array of opportunities to quantify a diversity of biochemical and structural vegetation properties. The processing requirements for such large data streams require reliable retrieval techniques enabling the spatiotemporally explicit quantification of biophysical variables. With the aim of preparing for this new era of Earth observation, this review summarizes the state-of-the-art retrieval methods that have been applied in experimental imaging spectroscopy studies inferring all kinds of vegetation biophysical variables. Identified retrieval methods are categorized into: (1) parametric regression, including vegetation indices, shape indices and spectral transformations; (2) nonparametric regression, including linear and nonlinear machine learning regression algorithms; (3) physically based, including inversion of radiative transfer models (RTMs) using numerical optimization and look-up table approaches; and (4) hybrid regression methods, which combine RTM simulations with machine learning regression methods. For each of these categories, an overview of widely applied methods with application to mapping vegetation properties is given. In view of processing imaging spectroscopy data, a critical aspect involves the challenge of dealing with spectral multicollinearity. The ability to provide robust estimates, retrieval uncertainties and acceptable retrieval processing speed are other important aspects in view of operational processing. Recommendations towards new-generation spectroscopy-based processing chains for operational production of biophysical variables are given.

Janus is an open source Python package for agent-based modeling (ABM) of land use and land cover change (LULCC). Many ABMs of LULCC have been created across platforms, some of which are not ideal for large scale, high resolution scenarios. This model provides a simple object-oriented framework for creating ABMs specific to LULCC. The organizational philosophy of the modeling framework is to create software objects (agents) that are associated with specific and contextual attributes which are isolated from where those agents exist in the spatial setting of the model, and still provide clear linkages between the agent, their environment, and other agents in the simulation. In this way, the framework allows for assembly of LULCC ABMs with low (programmatic) overhead, making the models extensible and providing clear mechanisms for integrating them with process-oriented biophysical models. Provided with Janus is a suite of geospatial data preprocessing tools that can use arbitrary land cover products as an input. Crop choice decisions are based on potential crop prices, these can be created synthetically, or drawn from integrated human-Earth systems models such as the Global Change Assessment Model. Janus is publicly accessible through GitHub and provides an example dataset for testing.
Code repository:
Link to paper:

Hydrological sciences / GRDC Station Catalogue
« on: June 27, 2020, 10:55:40 PM »
Download global #discharge (#river flow) data with few mouse clicks. Great effort by Global Runoff Data Centre (GRDC), Koblenz (Germany).

 The “Ecohydrologic separation” hypothesis challenged assumptions of translatory flow through the rooting zone. However, studies claiming to test ecohydrologic separation have largely diverged from testing how water infiltrates and recharges the rooting zone, towards identifying isotopic differences between stream water and plant water. We suggest that differences should exist among the isotopic compositions of water in plants, streams, and other subsurface pools in most scenarios and that ecohydrologic separation is not solely about observing fractionated isotope ratios in plant water. The discussion of ecohydrologic separation should refocus on how heterogeneous infiltration and root uptake processes lead to such differences. More generally, we propose that research objectives should involve interpreting isotope data in the context of processes, rather than settling on describing data patterns that have confounded interpretations (i.e., that plant and stream water isotopically differ). Consequently, we outline areas where plant and soil water stable isotope data can progress us towards improved understanding and representation of soil‐water transport and plant‐water recharge.
  Key points 
  • Isotope ratios of plant water should differ from water flowing in soils to streams and so we need to move beyond confirming this difference
  • To move beyond identifying ecohydrologic separation towards understanding it, we provide a framework for assessing soil water flow processes
  • By focusing on dynamics of how water infiltrates into the subsurface and becomes available to plants we can better interpret past findings

While agricultural expansion and management practices are critical for increasing global food production, there is limited understanding of how they impact fluxes of carbon, water, and energy from the land surface to the atmosphere. Global land models are useful for understanding these possible climate impacts, yet few global land models explicitly represent crops or crop management given the complexity of interactions between human decisions, crop phenology, and land processes at global scales. Our analysis illustrates that representing specific crop types, as well as irrigation and fertilization, in the Community Land Model (CLM) increases the amount of carbon that plants draw out the atmosphere and also changes patterns of evapotranspiration. Additionally, crop yield estimates from CLM compare well to observed crop yields until ~1990, when modeled crop yields level off. This occurs because CLM does not represent management practices associated with modern agricultural intensification. Overall, our results illustrate the impact that crop expansion and management may have on climate and highlight that global models should represent specific crop types and crop management to accurately capture carbon, water, and energy fluxes from the land surface.

Pages: [1] 2 3 ... 33