Part 1: Processing data in GRASS

Author

Verónica Andreo

Published

July 1, 2024

In this notebook we’ll go through the processing of MODIS LST daily time series data to derive relevant predictor variables for modeling the distribution of Aedes albopictus in Northern Italy. Furthermore, we’ll show how to obtain and process occurrence data and background points.

Let’s first go through some temporal concepts within GRASS GIS…

The TGRASS framework

GRASS GIS was the first FOSS GIS that incorporated capabilities to manage, analyze, process and visualize spatio-temporal data, as well as the temporal relationships among time series.

  • TGRASS is fully based on metadata and does not duplicate any dataset
  • Snapshot approach, i.e., adds time stamps to maps
  • A collection of time stamped maps (snapshots) of the same variable are called space-time datasets or STDS
  • Maps in a STDS can have different spatial and temporal extents
  • Space-time datasets can be composed of raster, raster 3D or vector maps, and so we call them:
    • Space time raster datasets (STRDS)
    • Space time 3D raster datasets (STR3DS)
    • Space time vector datasets (STVDS)

Temporal tools

GRASS temporal tools are named and organized following GRASS core naming scheme. In this way, we have:

  • t.*: general tools to handle STDS of all types
  • t.rast.*: tools that deal with STRDS
  • t.rast3d.*: tools that deal with STR3DS
  • t.vect.*: tools that deal with STVDS

Other TGRASS notions

  • Time can be defined as intervals (start and end time) or instances (only start time)
  • Time can be absolute (e.g., 2017-04-06 22:39:49) or relative (e.g., 4 years, 90 days)
  • Granularity is the greatest common divisor of the temporal extents (and possible gaps) of all maps in the space-time cube

  • Topology refers to temporal relations between time intervals in a STDS.

TGRASS framework and workflow

GRASS +

In this part of the studio we’ll work with GRASS and Python, so let’s first see/recall the very basics.

Python package grass.script

The grass.script or GRASS GIS Python Scripting Library provides functions for calling GRASS tools within Python scripts. The most commonly used functions include:

  • run_command: used when the output of the tools is a raster or vector, no text type output is expected
  • read_command: used when the output of the tools is of text type
  • parse_command: used with tools whose output can be converted to key=value pairs
  • write_command: used with tools that expect text input, either in the form of a file or from standard input

It also provides several wrapper functions for frequently used tools, for example:

  • To get info from a raster, script.raster.raster_info() is used: gs.raster_info('dsm')
  • To get info of a vector, script.vector.vector_info() is used: gs.vector_info('roads')
  • To list the raster in a project, script.core.list_grouped() is used: gs.list_grouped(type=['raster'])
  • To obtain the computational region, script.core.region() is used: gs.region()

Python package grass.jupyter

The grass.jupyter library improves the integration of GRASS and Jupyter, and provides different classes to facilitate GRASS maps visualization:

  • init: starts a GRASS session and sets up all necessary environment variables
  • Map: 2D rendering
  • Map3D: 3D rendering
  • InteractiveMap: interactive visualization with folium
  • SeriesMap: visualizations for a series of raster or vector maps
  • TimeSeriesMap: visualization for spatio-temporal data

Hands-on

So let’s start… We begin by setting variables, checking GRASS installation and initializing GRASS GIS

import os

# Data directory
homedir = os.path.join(os.path.expanduser('~'), "grass_foss4geu_2024")

# GRASS GIS database variables
grassbin = "grass"
grassdata = os.path.join(homedir, "grassdata")
project = "eu_laea"
mapset = "italy_LST_daily"
# Check the GRASS GIS installation
import subprocess
print(subprocess.check_output([grassbin, "--config", "version"], text=True))
# Ask GRASS GIS where its Python packages are 
import sys
sys.path.append(
    subprocess.check_output([grassbin, "--config", "python_path"], text=True).strip()
)

Now we are ready to start a GRASS GIS session

# Import the GRASS GIS packages we need
import grass.script as gs
import grass.jupyter as gj

# Start the GRASS GIS Session
session = gj.init(grassdata, project, mapset)

Explore data in the mapset

Let’s first explore what we have within the italy_LST_daily mapset and display vector and raster maps using different classes from grass.jupyter library.

# List vector elements
gs.list_grouped(type="vector")['italy_LST_daily']
# Display vector map
it_map = gj.Map(width=500, use_region=True)
it_map.d_vect(map="italy_borders_0")
it_map.show()
# List raster elements
rast = gs.list_grouped(type="raster", pattern="lst*")['italy_LST_daily']
rast[0:10]
# Display raster map with interactive class
lst_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
lst_map.add_raster("lst_2014.005_avg")
lst_map.add_layer_control(position = "bottomright")
lst_map.show()

SDM workflow

In this part of the Studio we’ll be addressing the left part of the SDM workflow, occurrence and background data and predictors:

Importing species records

We will use occurrence data already downloaded and cleaned. We need to import it into GRASS GIS first.

# Import mosquito records
gs.run_command("v.import",
               input=os.path.join(homedir,"aedes_albopictus.gpkg"),
               output="aedes_albopictus")

Let’s add the occurrence points over the previous interactive map

# Display raster map with interactive class
lst_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
lst_map.add_raster("lst_2014.005_avg")
lst_map.add_vector("aedes_albopictus")
lst_map.add_layer_control(position = "bottomright")
lst_map.show()

You can also get the mosquito occurrences (or any other species or taxa) directly from GBIF into GRASS by means of v.in.pygbif as follows:

# Set computational region
# region = gs.parse_command("g.region", raster="lst_2014.001_avg", flags="g")
# region
# Install extension (requires pygbif: pip install pygbif)
# gs.run_command("g.extension",
#                extension="v.in.pygbif")
# Import data from GBIF
# gs.run_command("v.in.pygbif", 
#                output="aedes_albopictus",
#                taxa="Aedes albopictus",
#                date_from="2014-01-01",
#                date_to="2018-12-31")

Creating random background points

The algorithm MaxEnt that we will use in the next part of this session requires not only the projects of known occurrences, but also information on the rest of the environment available. These are not absences but background data, we actually do not know if the species is there or not, but we need it to compare with the features of the places where the species does occur.

To avoid getting background points exactly where occurrences are, we’ll create buffers around them. Then, we need to ensure that background points are only over land within our computational region. In order to do that, we’ll create a mask over land and we’ll overlay the buffers with the mask. Can you guess what the output will be?

# Create buffer around Aedes albopictus records
gs.run_command("v.buffer",
               input="aedes_albopictus",
               output="aedes_buffer",
               distance=2000)
# Set computational region
region = gs.parse_command("g.region", raster="lst_2014.001_avg", flags="g")
region
# Create a vector mask to limit background points
expression="MASK = if(lst_2014.001_avg, 1, null())"
gs.raster.mapcalc(exp=expression)

gs.run_command("r.to.vect", 
               input="MASK",
               output="vect_mask",
               type="area")
# Subtract buffers from vector mask
gs.run_command("v.overlay",
               ainput="vect_mask",
               binput="aedes_buffer",
               operator="xor",
               output="mask_bg")

Let’s display the result

# Display raster map with interactive class
mask_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
mask_map.add_vector("mask_bg")
mask_map.add_layer_control(position = "bottomright")
mask_map.show()

Finally, let’s create the random background points…

# Generate random background points
gs.run_command("v.random",
               output="background_points",
               npoints=1000,
               restrict="mask_bg",
               seed=3749)

and display occurrence and background points together over an LST map.

# Display vector map
pb_map = gj.Map(width=500, use_region=True)
pb_map.d_rast(map="lst_2014.005_avg")
pb_map.d_vect(map="italy_borders_0", type="boundary")
pb_map.d_vect(map="background_points")
pb_map.d_vect(map="aedes_albopictus", icon="basic/diamond", fill_color="red", size=8)
pb_map.show()

Create daily LST time series

Now we’ll start processing the raster data to derive potentially relevant predictors to include in the model. Our data consists of a time series of daily LST averages. We’ll use the GRASS temporal framework. The first step is to create the time series object and register maps in it. See t.create and t.register for further details.

# Create time series 
gs.run_command("t.create",
               type="strds",
               temporaltype="absolute",
               output="lst_daily",
               title="Average Daily LST",
               description="Average daily LST in degree C - 2014-2018")
# Check it is created
gs.run_command("t.list",
              type="strds")
# Get list of maps 
map_list = gs.list_grouped(type="raster", pattern="lst_201*")['italy_LST_daily']
map_list[0:10]
# Register maps in strds  
gs.run_command("t.register", 
               input="lst_daily",
               maps=map_list,
               increment="1 days",
               start="2014-01-01", 
               flags="i")
# Get info about the strds
gs.run_command("t.info",
               input="lst_daily")

Generate environmental variables from LST STRDS

Now that we created the time series or STRDS, let’s start estimating relevant variables. We start by calculating long term aggregations, also called climatologies.

Long term monthly avg, min and max LST

Let’s see an example first; we’ll estimate the average of all maps which start date is within January.

# January average LST
gs.run_command("t.rast.series",
               input="lst_daily",
               method="average",
               where="strftime('%m', start_time)='01'",
               output="lst_average_jan")
# Get map info and check values
gs.raster_info("lst_average_jan")['min'], gs.raster_info("lst_average_jan")['max']

If we want to estimate climatologies for all months, let’s try first to get the list of maps that will be the input for t.rast.series, for that we’ll test the condition in t.rast.list first.

# Define list of months as required
months=['{0:02d}'.format(m) for m in range(1,13)]

for m in months:
    gs.run_command("t.rast.list",
                   input="lst_daily",
                   where=f"strftime('%m', start_time)='{m}'")

Now we add the methods and we are ready to estimate climatologies for all months with three different methods.

# Now we estimate the climatologies for all months and methods
months=['{0:02d}'.format(m) for m in range(1,13)]
methods=["average","minimum","maximum"]

for m in months:
    for me in methods:
        gs.run_command("t.rast.series", 
                       input="lst_daily",
                       method=me,
                       where=f"strftime('%m', start_time)='{m}'",
                       output="lst_{}_{}".format(me,m))
# List newly created maps
map_list = gs.list_grouped(type="raster", pattern="*{average,minimum,maximum}*")['italy_LST_daily']
print(map_list)
# Remove lst_average_jan
gs.run_command("g.remove", type="raster", name="lst_average_jan", flags="f")

Bioclimatic variables

Perhaps you have heard of Worldclim or CHELSA bioclimatic variables? Well, this are 19 variables that represent potentially limiting conditions for species. They derive from the combination of temperature and precipitation long term averages. As we do not have precipitation data in this exercise, we’ll only estimate the bioclimatic variables that include temperature. See r.bioclim manual for further details. Note that we’ll use the climatologies estimated in the previous step.

# Install extension
gs.run_command("g.extension",
               extension="r.bioclim")
# Get lists of maps needed
tmin=gs.list_grouped(type="raster", pattern="lst_minimum_??")['italy_LST_daily']
tmax=gs.list_grouped(type="raster", pattern="lst_maximum_??")['italy_LST_daily']
tavg=gs.list_grouped(type="raster", pattern="lst_average_??")['italy_LST_daily']

print(tmin,tmax,tavg)
# Estimate temperature related bioclimatic variables
gs.run_command("r.bioclim", 
               tmin=tmin, 
               tmax=tmax,
               tavg=tavg, 
               output="worldclim_") 
# List output maps
gs.list_grouped(type="raster", pattern="worldclim*")['italy_LST_daily']

Let’s have a look at some of the maps we just created

# Display raster map with interactive class
bio_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
bio_map.add_raster("worldclim_bio01")
bio_map.add_raster("worldclim_bio02")
bio_map.add_layer_control(position = "bottomright")
bio_map.show()

Spring warming

We define spring warming as the velocity with which temperature increases from winter into spring and we calculate it as slope(daily Tmean February-March-April). We will use t.rast.aggregate.

# Define list of months
months=['{0:02d}'.format(m) for m in range(2,5)]
# Annual spring warming
gs.run_command("t.rast.aggregate",
               input="lst_daily",
               output="annual_spring_warming",
               basename="spring_warming",
               suffix="gran",
               method="slope",
               granularity="1 years",
               where=f"strftime('%m',start_time)='{months[0]}' or strftime('%m',start_time)='{months[1]}' or strftime('%m', start_time)='{months[2]}'")
# Check raster maps in the STRDS
gs.run_command("t.rast.list", input="annual_spring_warming")
# Average spring warming
gs.run_command("t.rast.series",
               input="annual_spring_warming",
               output="avg_spring_warming",
               method="average")
# Display raster map with interactive class
auc_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
auc_map.add_raster("avg_spring_warming")
auc_map.add_layer_control(position = "bottomright")
auc_map.show()

Autumnal cooling

We define autumnal cooling as the velocity with which temperature decreases from summer into fall and we calculate it as slope(daily Tmean August-September-October).

# Define list of months
months=['{0:02d}'.format(m) for m in range(8,11)]
# Annual autumnal cooling
gs.run_command("t.rast.aggregate",
               input="lst_daily",
               output="annual_autumnal_cooling",
               basename="autumnal_cooling",
               suffix="gran",
               method="slope",
               granularity="1 years",
               where=f"strftime('%m',start_time)='{months[0]}' or strftime('%m',start_time)='{months[1]}' or strftime('%m', start_time)='{months[2]}'")
# Check raster maps in the STRDS
gs.run_command("t.rast.list", input="annual_autumnal_cooling")
# Average autumnal cooling
gs.run_command("t.rast.series",
               input="annual_autumnal_cooling",
               output="avg_autumnal_cooling",
               method="average")
# Display raster map with interactive class
spw_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
spw_map.add_raster("avg_autumnal_cooling")
spw_map.add_layer_control(position = "bottomright")
spw_map.show()

Number of days with LSTmean >= 20 and <= 30

Mosquitoes (and virus they might carry) tend to thrive in a certain range of temperatures. Let’s assume this range is from 20 to 30 °C. Here, we’ll estimate the number of days within this range per year, and then, we’ll estimate the average along years. See t.rast.algebra manual for further details.

# Keep only pixels meeting the condition
expression="tmean_higher20_lower30 = if(lst_daily >= 20.0 && lst_daily <= 30.0, 1, null())"

gs.run_command("t.rast.algebra",
               expression=expression, 
               basename="tmean_higher20_lower30",
               suffix="gran",
               nproc=7, 
               flags="n")
# Count how many times per year the condition is met
gs.run_command("t.rast.aggregate",
               input="tmean_higher20_lower30", 
               output="count_tmean_higher20_lower30",
               basename="tmean_higher20_lower30",
               suffix="gran",
               method="count",
               granularity="1 years")
# Check raster maps in the STRDS
gs.run_command("t.rast.list", 
               input="count_tmean_higher20_lower30", 
               columns="name,start_time,min,max")
# Average number of days with LSTmean >= 20 and <= 30
gs.run_command("t.rast.series",
               input="count_tmean_higher20_lower30",
               output="avg_count_tmean_higher20_lower30",
               method="average")
# Display raster map with interactive class
h20_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
h20_map.add_raster("avg_count_tmean_higher20_lower30")
h20_map.add_layer_control(position = "bottomright")
h20_map.show()

Number of consecutive days with LSTmean <= -10.0

Likewise, there are temperature thresholds that mark a limit to mosquito survival. Here, we’ll use the temperature lower limit to survival. Most importantly, we we’ll count the number of consecutive days with temperatures below this threshold.

Here, we’ll use again the temporal algebra and we’ll recall the concept of topology that we defined at the beginning of the notebook. First, we need to create a STRDS of annual granularity that will contain only zeroes. This annual STRDS, that we call annual mask, will be the base to add 1 each time the condition of less than -10 °C in consecutive days is met. Finally, we estimate the median number of days with LST lower than -10 °C over the 5 years.

# Create annual mask
gs.run_command("t.rast.aggregate",
               input="lst_daily",
               output="annual_mask",
               basename="annual_mask",
               suffix="gran",
               granularity="1 year",
               method="count")
# Replace values by zero
expression="if(annual_mask, 0)"

gs.run_command("t.rast.mapcalc",
               input="annual_mask",
               output="annual_mask_0",
               expression=expression,
               basename="annual_mask_0")
# Calculate consecutive days with LST <= -10.0
expression="lower_m10_consec_days = annual_mask_0 {+,contains,l} if(lst_daily <= -10.0 && lst_daily[-1] <= -10.0 || lst_daily[1] <= -10.0 && lst_daily <= -10.0, 1, 0)"

gs.run_command("t.rast.algebra",
               expression=expression,
               basename="lower_m10",
               suffix="gran",
               nproc=7)
# Inspect values
gs.run_command("t.rast.list",
               input="lower_m10_consec_days",
               columns="name,start_time,min,max")
# Median number of consecutive days with LST <= -10
gs.run_command("t.rast.series",
               input="lower_m10_consec_days",
               output="median_lower_m10_consec_days",
               method="median")
# Display raster map with interactive class
lt10_map = gj.InteractiveMap(width = 500, use_region=True, tiles="OpenStreetMap")
lt10_map.add_raster("median_lower_m10_consec_days")
lt10_map.add_layer_control(position = "bottomright")
lt10_map.show()

We have now derived many potentially relevant predictors for the mosquito habitat suitability and we could still derive some more, for example, the number of mosquito or virus cycles per year based on development temperature thresholds and growing degree days (GDD). This could be achieved with t.rast.accumulate and t.rast.accdetect.

We will now open a GRASS session from R and perform SDM there.

:::