1 Overview

UltraGBIF is a high-performance R package designed to compile, validate, and standardize plant occurrence records from the Global Biodiversity Information Facility (GBIF) into analysis-ready datasets. The package functions are categorized into four stages with seven functions (Figure 1).

UltraGBIF achieves outstanding performance through specific technical architectures:

C/C++ Backend Integration: Leverages data.table, stringi, and terra, all implemented in C/C++
Vectorization Over Explicit Loops: Bypasses R’s interpretive overhead by dispatching to pre-compiled routines
SIMD Exploitation: Vectorized operations enable compiler-level SIMD auto-vectorization
Memory-Efficient Design: In-place modification (set(), :=) eliminates intermediate copies
Chunk-Based Parallelization: Vectorized processing within chunks, parallel execution across chunks
Performance: On a standard laptop, UltraGBIF can compile one million occurrence records within 15 minutes.

Figure 1 4 stages and 7 modules of UltraGBIF. After module 1~6, generally 30% of the raw occurrence records are retained.

2 Installation

Prerequisites:

R version: ≥ 4.1.0
Operating system: Windows (10+), macOS (11+), or Linux (Ubuntu 20.04+)
GBIF account: Required for data downloads (free registration at https://www.gbif.org)
System dependencies:
- curl (for downloading data)
- libssl (for secure connections)

UltraGBIF can be installed using the two commands below:

options(repos = c(getOption("repos"),"https://anonymous.4open.science/r/Repo-902F"))
install.packages("UltraGBIF")

Since UltraGBIF already meets the requirements of The Comprehensive R Archive Network (official CRAN), you can quickly install it with a single command in the future:

install.packages("UltraGBIF") ## Soon via official CRAN

3 Quick Start

The following code demonstrates the complete UltraGBIF workflow from data import to dynamic mapping:

download_key <- “0021523-250402121839773”

raw data page: https://doi.org/10.15468/dl.bythb4

# Download Saxifraga occurrence records from GBIF using gbif download key or from gbif web    
# data page: https://doi.org/10.15468/dl.bythb4
# Often, downloading files in R is very slow. You can open the data page link
# directly in an external browser and click the download button to download it
# manually. This is usually much faster, especially when downloading larger datasets.
if (!require(rgbif)) install.packages('rgbif'); library(rgbif)
download_key <-  "0021523-250402121839773" 
fpath <-  getwd() # or your own directory, e.g. fpath <- "C:/" 
occ_download_get(download_key, path = fpath, overwrite = TRUE)

# Step 1: Import GBIF occurrence records
library(UltraGBIF)
occ_import <- import_records(
  GBIF_file = paste0(fpath,"/0021523-250402121839773.zip"),
  only_PRESERVED_SPECIMEN = TRUE
)

# Step 2: Standardize taxonomic names using TNRS
taxa_checked <- check_occ_taxon(
  occ_import = occ_import,
  accuracy = 0.9
)

# Step 3: Standardize collector names
collectors_dictionary <- check_collectors(
  occ_import = occ_import,
  min_char = 2
)

# Step 4: Generate collection event keys
collection_key <- set_collection_mark(
  occ_import = occ_import,
  collectors_dictionary = collectors_dictionary
)


# Step 5: Select digital vouchers
voucher <- set_digital_voucher(
  occ_import = occ_import,
  taxa_checked = taxa_checked,
  collection_key = collection_key
)

# Step 6: Refine records (coordinate validation + native status)
refined_records <- refine_records(
  voucher = voucher,
  threads = 4,
  save_path = getwd()
)

# Optional: Visualize results
map_records(refined_records = refined_records, precision = 4, cex = 4)

After running the complete workflow, you will obtain:

Object	Description
`occ_import`	Imported occurrence records with standardized fields
`taxa_checked`	Taxonomically validated names with confidence scores
`collectors_dictionary`	cleaned collector name mappings
`collection_key`	Unique identifiers for each collection event
`voucher`	Filtered digital voucher records
`refined_records`	Final cleaned dataset with validated coordinates and native status

4 Step By Step Guide

4.1 Search and Download Occurrence Records from GBIF

Download the raw occurrence data via opening GBIF occurrence website https://www.gbif.org/occurrence/search. Log in GBIF using your account (if you do not have a GIBF account you can register following the GBIF instructions). Then come back to filter records online (you should see a page like Fig2 below). For this examples, select from the available options Basis of record = Preserved specimen ( you can also choose others additionally), Occurrence status = present, and Scientific name = Saxifraga.

Fig2: Filter records online

When done, click the DOWNLOAD button and click the DARWIN CORE ARCHIVE button, see Fig1 options marked in red mark. Wait few minutes for GBIF server to process your query and the page will refresh automatically. For Saxifraga used in this tutorial, just open https://doi.org/10.15468/dl.bythb4 and click the DOWNLOAD button as shown in red on Fig3.

Fig3: Download records online

Now you get the Darwin Core Archive (DwC-A) formatted data successfully.

Advanced: Using rgbif

This subsection demonstrates how to programmatically download biodiversity occurrence data from the Global Biodiversity Information Facility (GBIF) using the rgbif package in R. The workflow is specifically designed for retrieving Darwin Core Archive (DwC-A) formatted data for the plant genus Saxifraga, with occurrence.txt (interpreted data) and verbatim.txt (raw data) alongside metadata XML files included. You can customize your own search by modifying filter conditions using pred() within the occ_download() function.

Quick Start: Download Using an Existing Download Key

The example data can be retrieved using the download key: 0021523-250402121839773.

if (!require(rgbif)) install.packages('rgbif'); library(rgbif)
if (!require(dplyr)) install.packages('dplyr'); library(dplyr)

# Assign the download key
download_key <- "0021523-250402121839773"

# Check download status first
status <- occ_download_meta(download_key)
print(status$status)  # Should show "SUCCEEDED"

# Set download path
fpath <- "C:/Users/YourName/Downloads/"

# Download if status is SUCCEEDED
if(status$status == "SUCCEEDED") {
occ_download_get(download_key, path = fpath, overwrite = TRUE) ## very slow
}

Full Workflow: Submit a New Download Request

The following code demonstrates the complete workflow for submitting a custom download request to GBIF.

if (!require(rgbif)) install.packages('rgbif'); library(rgbif);library(dplyr)
## Step 1: Set GBIF account credentials (recommended to write to .Renviron file)
user <- "your_username"
pwd <- "your_password"
email <- "your_email@example.com"

## Step 2: Get the genus key for Saxifraga
gkey <- name_backbone("Saxifraga")  # Returns the GBIF taxonomy key

## Step 3: Submit download request (using Saxifraga genus data as an example)
download <- occ_download(
pred("taxonKey", gkey),                # Taxon key for Saxifraga  
pred("hasCoordinate", TRUE),      # Only get records with coordinates  
pred_and(pred_gte("year", 1990), 
         pred_lte("year", 2025)), # Filter data from year 1990 to 2025  
user = user,
pwd = pwd,
email = email)

## Step 4: View download ID for subsequent tracking and download
download_key <- download$key
print(download_key)

## Step 5: Wait for the task to complete (periodically checks status)
occ_download_wait(download_key) 

## Step 6: Download the ZIP file to local (default saves to current working directory)
occ_download_get(download_key, 
                 path = "your_data_directory/", 
                 overwrite = TRUE) ## very slow

4.2 Import Records

After downloading the compressed DwC-A formatted data from GBIF, we can import and process the occurrence records using the UltraGBIF package.

library(UltraGBIF);library(data.table)
# Specify the path to your downloaded GBIF file
gbif_occurrence_file  <- file.path(path, '0021523-250402121839773.zip')
# Alternatively, you can use an interactive file selection window by gbif_occurrence_file <- file.choose()
# Import example data: only_PRESERVED_SPECIMEN = TRUE keeps 
# only physical specimen records (excludes human observations, camera traps, etc.)
occ_import <- import_records(
  GBIF_file = gbif_occurrence_file,
  only_PRESERVED_SPECIMEN = TRUE
)

Common components in occ_import:

occ: raw occurrence records with 55 fields (Ctrl-prefixed).

occ_gbif_issue: a binary matrix of GBIF issues and flags indicating common data quality problems, e.g. geospatial issues, taxonomic issues. See https://data-blog.gbif.org/post/issues-and-flags/ for details.
summary: the count of occurrence with each specific issue
runtime: the time consuming of import_records()

library(UltraGBIF);library(data.table);library(dplyr)
#occ_import components:
names(occ_import)

## [1] "occ"            "occ_gbif_issue" "summary"        "runtime"

#Data samples:
occ_import$occ[,.(Ctrl_gbifID, Ctrl_scientificName, Ctrl_basisOfRecord,
                  Ctrl_decimalLatitude, Ctrl_decimalLongitude)]%>%head()

# View the first 10 rows and first three columns of the occ_gbif_issue
occ_import$occ_gbif_issue[1:5]

# View the first 5 rows of the summary
head(occ_import$summary, 5)

# time cost
occ_import$runtime

## Time difference of 3.966486 secs

Summary statistics for raw occurrences:

library(data.table);library(dplyr)
# Initialize a list to store summary information
UltraGBIF_summary <- list() 
UltraGBIF_summary$raw_records <- occ_import$occ[, .N]
UltraGBIF_summary$raw_taxa <- occ_import$occ[, uniqueN(Ctrl_scientificName)]
UltraGBIF_summary$families <- occ_import$occ[, uniqueN(Ctrl_family)]
UltraGBIF_summary$dataset <- occ_import$occ[, uniqueN(Ctrl_datasetName)]
UltraGBIF_summary$date_from <- occ_import$occ[, range(Ctrl_year, na.rm = TRUE)][1]
UltraGBIF_summary$date_to <- occ_import$occ[, range(Ctrl_year, na.rm = TRUE)][2]
UltraGBIF_summary%>%as.data.table()

4.3 Check Taxon Name

Standardizes taxonomic names using the Taxonomic Name Resolution Service (TNRS) (Boyle et al. 2013) against the World Checklist of Vascular Plants (WCVP). The TNRS corrects spelling errors, resolves alternative spellings, and converts outdated synonyms to their current accepted names, enabling consistent taxonomic identification across the dataset.

library(UltraGBIF)
taxa_checked <- check_occ_taxon(
  occ_import = occ_import,
  accuracy = 0.9
)

taxa_checked is a list with three components:

occ_wcvp_check_name: resolved taxonomic name for each gbif record. Unmached names was NA.
summary: Summary for unique searched statistics of taxonomic resolution
runtime: the time consuming of check_occ_taxon()

library(data.table);library(dplyr)
taxa_checked$summary[wcvp_searchedName=="Saxifraga adscendens subsp. discolor",]

taxa_checked$occ_wcvp_check_name[wcvp_plant_name_id_of_searchedName==728677]

occ_import$occ[Ctrl_gbifID=="1949939657",]

4.4 Check Collectors

Simplifies and extracts the primary collector’s name from the recordedBy field, then builds a standardized collector table. This module reduces inconsistencies that can fragment single collection events and improves the accuracy of subsequent duplicate detection.

library(UltraGBIF)
collectors_dictionary <- check_collectors(
  occ_import = occ_import,
  min_char = 2
)
# for more details use ?check_collectors

collectors_dictionary is a list with two components:

collectors_dictionary: the primary collector’s name and the corresponding Standarized name
runtime: the time consuming of check_occ_taxon()

library(dplyr)
collectors_dictionary$collectors_dictionary%>%head(5)

collectors_dictionary$runtime

## Time difference of 10.74665 secs

# update summary
library(data.table)
UltraGBIF_summary$raw_collectors <- collectors_dictionary$collectors_dictionary[,.N]
UltraGBIF_summary$checked_collectors <- collectors_dictionary$collectors_dictionary[
  ,uniqueN(Ctrl_nameRecordedBy_Standard)]

4.5 Generate Unique Collection Mark

Generates a standardized composite key for each unique collection event. This module identifies and consolidates duplicates into unique collection events, where a collection event represents a distinct sampling instance (a specific collector at a specific date or with a specific record number).

The collection event key is constructed by concatenating three fields, collector name and collection number generated by check_collectors().

collection_key <- set_collection_mark(
  occ_import = occ_import,
  collectors_dictionary = collectors_dictionary
)
# for more details use ?set_collection_mark

collection_key is a list with four components:

collection_key: cleaned family, collector name and collection number for each record
full_keys: keys constructed from family, collector name, and collection number
incomplete_keys: lack one or more of the required components (family, collector name, or collection number), which may indicate incomplete metadata in the original records.
runtime: the time consumed by the set_collection_mark() function.

library(dplyr)
collection_key$collection_key%>%head(5)

collection_key$full_keys%>%head(5)

collection_key$incomplete_keys%>%head(5)

collection_key$runtime

## Time difference of 0.7315838 secs

Summary statistics for collection_key:

library(data.table)
UltraGBIF_summary$full_keys <- collection_key$full_keys[,.N]
UltraGBIF_summary$incomplete_keys <- collection_key$incomplete_keys[,.N]
UltraGBIF_summary%>%as.matrix()

##                    [,1]  
## raw_records        228494
## raw_taxa           1848  
## families           1     
## dataset            109   
## date_from          1550  
## date_to            2024  
## raw_collectors     37691 
## checked_collectors 14113 
## full_keys          90186 
## incomplete_keys    11736

4.6 Set Digital Voucher

Resolves duplicate records and selects the master digital voucher to represent each collection event. For records possessing complete collection event keys (full keys) from collection_key, duplicates are identified and grouped; within each group, a master digital voucher is selected based on a composite quality score. Resolved taxonomic names in taxa_checked were included.

library(UltraGBIF)
voucher <- set_digital_voucher(
  occ_import = occ_import,
  taxa_checked = taxa_checked,
  collection_key = collection_key
)
gc() # Reports memory usage and triggers a garbage collection

# for more details use ?set_digital_voucher

voucher is a list with three components:

occ_digital_voucher: a table with 80 fields by adding key issues and flags to raw occurrences.
occ_results: a table only has quality assessment fields for raw occurrences.
runtime: the time consumed by the set_collection_mark() function.

Summary statistics for voucher:

library(dplyr)
voucher$occ_digital_voucher%>%head(5)

voucher$occ_results%>%head(5)

voucher$runtime

## Time difference of 4.499273 secs

UltraGBIF_summary$usable_records_before_refine <-
  voucher$occ_digital_voucher[UltraGBIF_dataset_result=='usable',.N]

4.7 Refine Records

For usable vouchers, missing information in key fields is restored by consolidating data from duplicate records belonging to identical collection events. This creates one clean record per unique event. These validated records are automatically subjected to geospatial validation using CoordinateCleaner (Zizka et al. 2019) and terra. Each validated record is matched to its corresponding WGSRPD Level-3 Botanical Country classifications with different status (native, introduced, extinct, location_doubtful), minimizing mis-classification risks in cross-regional analyses. When save_path is specified, native_records and other_records are exported as compressed CSV files (native_refined_records.csv.gz and usable_refined_records.csv.gz). These files can be used for down flow analysis.

library(UltraGBIF)
# Parallel Processing using foreach/doParallel to accelerate processing of large datasets.
refined_records <- refine_records(
  voucher = voucher,
  threads = 4,
  save_path = dirname(gbif_occurrence_file),
  tests = c("capitals", "centroids", "equal", "gbif",
            "institutions", "outliers", "seas", "zeros"))
# for more details use ?refine_records

refined_records is a list with three components:

native_records: occurrences classified as native according to WCVP
other_records: occurrences classified as as introduced, extinct, location_doubtful, or unknown
runtime: the time consumed by refine_records() function.

Summary statistics for refined_records:

library(data.table);library(dplyr)
refined_records$native_records%>%head(5)

refined_records$other_records%>%head(5)

refined_records$runtime

## Time difference of 37.24072 secs

UltraGBIF_summary$usable_records_after_refine <- 
  refined_records$native_records[,.N]+refined_records$other_records[,.N]

UltraGBIF_summary$final_taxa <- 
  uniqueN(c(refined_records$native_records[,UltraGBIF_wcvp_taxon_name],
            refined_records$other_records[,UltraGBIF_wcvp_taxon_name]))

4.8 Summary Satistics

The summary list (UltraGBIF_summary) presents key quantitative metrics tracking data reduction and quality filtering across the UltraGBIF processing pipeline. Each statistic captures a distinct stage, from raw GBIF import to the final refined dataset. These statistics collectively illustrate how raw GBIF data are progressively filtered to produce a clean, reproducible, and analysis-ready dataset.

raw_records – Total raw occurrence records downloaded from GBIF prior to any filtering or processing.
families – Number of raw families of raw records
dataset – Number of raw datasets of raw records
date_from – The most distant date of raw records
date_to – The nearest date of raw records
raw_taxa – Number of unique scientific names present in the raw data before taxonomic resolution (e.g., synonym handling).
raw_collectors – Number of unique collector name strings before standardization (accounts for spelling variants, initials, etc.).
checked_collectors – Number of unique collector names after applying standardization rules (e.g., name normalization, abbreviation expansion).
full_keys – Number of collection event keys considered complete (i.e., containing all necessary components for grouping, such as collector, number, and date).
incomplete_keys – Number of collection event keys missing one or more critical components, making them non-groupable.
usable_records_before_refine – Number of records classified as usable after initial voucher selection (e.g., presence of physical or image voucher).
usable_records_after_refine – Number of records retained after further refinement steps (e.g., removing duplicates, suspect georeferences, or invalid dates).
final_taxa – Number of unique accepted taxon names in the final refined dataset after taxonomic reconciliation and filtering.

library(dplyr)
UltraGBIF_summary%>%as.matrix() ## Convenient for viewing

##                              [,1]  
## raw_records                  228494
## raw_taxa                     1848  
## families                     1     
## dataset                      109   
## date_from                    1550  
## date_to                      2024  
## raw_collectors               37691 
## checked_collectors           14113 
## full_keys                    90186 
## incomplete_keys              11736 
## usable_records_before_refine 66921 
## usable_records_after_refine  59673 
## final_taxa                   496

4.9 Visualization

Spatial visualization using map_records()

We render verified records onto customizable, dynamic interactive maps using mapview (Appelhans et al. 2023). This module employs geohashTools (Chirico 2023) to perform spatial clustering of geo-coordinates.

After executing map_records with the code in the block below, a browser will open to load the rendered interactive map (it might not automatically pop up in the foreground). In this case, please manually switch to the browser window.

map_records(
  refined_records = refined_records, # Output of the UltraGBIF refine_records()
  precision = 2, # Rendering precision in 1250 km resolution
  cex = 4  # Point size (scaling factor)
)
# for more details use ?map_records

precision parameter determines grid resolution

Precision=4: approximately 20 km resolution
Precision=3: approximately 156 km resolution

cex parameter used to adjust symbol size

$Figure 4 Spatial visualization of refined records. up, OpenStreetMap layer; down, Esri World Imagery layer.$

Figure 4 Spatial visualization of refined records. up, OpenStreetMap layer; down, Esri World Imagery layer.

Dynamic maps support Multiple basemap layers (OpenStreetMap, Esri World Imagery, Stadia Stamen Watercolor), native status information legend. Clickable popups with record metadata.

5 More Applications

After [Refine Records], we have got compiled native records from refined_records, Now we can leverage them for more downstream applications. letsR(Vilela and Villalobos 2015) is an R package designed for macroecological analyses based on species geographic distributions. The package transforms species range maps into presence-absence matrices (PAMs), where rows represent grid cells and columns represent species. It provides a comprehensive workflow for spatial biodiversity analyses, including tools to:

construct PAMs from shapefiles or occurrence points;
integrate environmental variables into species distribution data;
calculate species-level macroecological metrics such as range size, range overlap, and distributional midpoints; (
map species richness across geographic, environmental, and trait spaces;
perform grid-level analyses of biodiversity patterns.

The package implements an S3 class structure (PresenceAbsence) that facilitates reproducible spatial analyses and supports integration with IUCN Red List and BirdLife data. All spatial operations are optimized for large-scale datasets, making letsR particularly suitable for comparative biogeography and conservation prioritization studies.

5.1 Introduce Presence-Absence-Matrix

A Presence-Absence Matrix (PAM) is the fundamental data structure in macroecological research that represents species distributions in a discrete spatial framework.

In essence, a PAM is a two-dimensional matrix where:

Rows represent discrete spatial units (grid cells) covering a geographic domain
Columns represent species
Values are binary: 1 indicates presence, 0 indicates absence

The first two columns typically store the longitude (x) and latitude (y) coordinates of each grid cell’s centroid, allowing the matrix to be spatially referenced.

Why PAMs matter:

Standardization — Converting continuous range maps (shapefiles) or scattered occurrence points into a standardized grid allows meaningful comparisons across species with vastly different distributions
Computational efficiency — Binary matrices enable rapid calculations of biodiversity metrics (richness, turnover, endemism) across thousands of species and millions of cells
Multi-scale integration — PAMs serve as the bridge between species-level data (traits, phylogeny, conservation status) and environmental data (climate, land use, topography)
Reproducibility — Explicitly defined grid systems eliminate ambiguities in spatial resolution, projection, and extent

In letsR’s implementation:

The package wraps PAMs in a PresenceAbsence S3 object that bundles:

The presence-absence matrix itself
A raster layer of species richness
Species names and metadata
Grid system parameters (resolution, projection, extent)

This design allows you to apply generic R functions (plot, summary, print) directly to the object while maintaining spatial integrity throughout analytical workflows.

5.2 From UltraGBIF Results to PAM

How can I use native records from UltraGBIF to get the Presence-Absence Matrix? First, prepare input data.

if (!require(letsR)) install.packages('letsR');library(letsR)

## Warning: package 'letsR' was built under R version 4.5.3

library(data.table)
library(stringi)
library(dplyr)
## prepare
PAM_pre <- refined_records$native_records[,.(UltraGBIF_wcvp_taxon_name,
                                             genus=stri_extract_first_regex(
                                               UltraGBIF_wcvp_taxon_name, "^[^ ]+"),
                                             UltraGBIF_wcvp_family,
                                             UltraGBIF_decimalLongitude,
                                             UltraGBIF_decimalLatitude)]
head(PAM_pre,5)

# If you are interested in a specific family or genus, you can manually filter PAM_pre here.
# For example PAM_pre <- PAM_pre[genus=='my_interested_genus',]

OK, now construct the Presence-Absence Matrix

library(letsR)
PAM <- lets.presab.points(xy = PAM_pre[,.(UltraGBIF_decimalLongitude,
                                          UltraGBIF_decimalLatitude)]%>%as.matrix(), ## remember as.matrix()
                          species = PAM_pre$UltraGBIF_wcvp_taxon_name,
                          resol = 1,## 1 means one degree grids
                          count = F)## if T show progress bar
## See more details by ?lets.presab.points

5.3 Summary Statistics

I want to know basic information about my Presence-Absence object:

library(letsR);library(data.table)
summary(PAM)

## 
## Class: PresenceAbsence
## _ _
## Number of species: 462 
## Number of cells: 3058
## Cells with presence: 3058
## Cells without presence: 0
## Species without presence: 0
## Species with the largest range: Saxifraga cernua
## _ _
## Grid parameters
## Resolution: 1, 1 (x, y)
## Extention: -179.78, 180.22, -54.8333, 83.1667 (xmin, xmax, ymin, ymax)
## Coord. Ref.:  +proj=longlat +datum=WGS84 +no_defs

names(PAM)

## [1] "Presence_and_Absence_Matrix" "Richness_Raster"            
## [3] "Species_name"

PAM$Presence_and_Absence_Matrix[1:10,1:10]%>%as.data.table()

PAM$Species_name%>%head()

## [1] "Micranthes integrifolia"  "Micranthes manchuriensis"
## [3] "Micranthes merkii"        "Micranthes occidentalis" 
## [5] "Micranthes oregana"       "Micranthes tenuis"

Which species has the widest distribution range (occupying the most grids)?

library(letsR)
lets.rangesize(x = PAM, units = "cell")%>%
  as.data.frame() %>%
  arrange(desc(Range_size)) %>%head(5)

# The answer is Saxifraga cernua

What are the characteristics of the size of a species’ distribution range?

library(dplyr);library(letsR)
## What is the mean size of species distribution range?
lets.rangesize(x = PAM, units = "cell")%>%mean()%>%ceiling()

## [1] 24

## What percentage of species have a distribution range exceeding the mean?
which(lets.rangesize(x = PAM, units = "cell")>
        lets.rangesize(x = PAM, units = "cell")%>%
        mean()%>%
        ceiling())%>%
  length()/nrow(lets.rangesize(x = PAM, units = "cell"))

## [1] 0.1363636

## plot histogram
lets.rangesize(x = PAM, units = "cell")%>%
  log()%>%
  hist(main='characteristics of the size of a species distribution range',
       xlab='log(counts of grids)')

Figure 5 characteristics of the size of a species distribution range

The results show that most species have small distribution ranges, with species exceeding the average range accounting for less than 14%.

5.4 Generate One Degree Species Richness Map

It is easy to generate one degree species richness map by letsR:

library(letsR)
plot(PAM,
     xlab = "Longitude", 
     ylab = "Latitude",
     main="Saxifraga richness map")

Figure 6 Saxifraga richness map

5.5 Generate the Individual Taxon Occurrence Map

I want to view an individual taxon occurrence map:

library(letsR)
selected_taxon <- 'Saxifraga cernua'
plot(PAM,
     name=selected_taxon,
     main=paste(selected_taxon,'occurrence'))

Figure 7 Saxifraga cernua occurrence map

5.6 Add Grid Variables Such as Environment Variables

Environmental variables can be integrated into the presence-absence matrix using the lets.addvar function from the letsR package.

We added global annual mean temperature data (°C) at 10-arcminute resolution (~340 km² grid cells) from letsR built-in dataset. The function automatically handled spatial alignment by cropping the temperature raster to match the PAM extent, aggregating high-resolution pixels to the PAM grid resolution using mean values, and extracting temperature at each grid cell centroid. The resulting matrix contained species occurrence data alongside mean temperature values for each spatial unit, enabling downstream analyses of species-environment relationships.

library(letsR)
temp <- terra::unwrap(letsR::temp)## wc2.1_10m_bio_1
PAM_env <- lets.addvar(PAM, temp, fun = mean)## add temp var
PAM_env[1:5,c(1,2, ncol(PAM_env))]## view

##      Longitude(x) Latitude(y) wc2.1_10m_bio_1_mean
## [1,]       -76.28     82.6667            -19.95376
## [2,]       -74.28     82.6667            -20.00872
## [3,]       -68.28     82.6667            -18.36175
## [4,]       -67.28     82.6667            -18.96207
## [5,]       -66.28     82.6667            -17.52159

Now summarize mean temperature per species:

# Summarize mean temperature per species
temp_mean <- lets.summarizer(PAM_env, ncol(PAM_env))
## Which five species have the highest annual average temperature in their
## distribution range and living environment?
temp_mean%>%arrange(desc(wc2.1_10m_bio_1_mean))%>%head(5)

## Which five species have the lowest annual average temperature in their
## distribution range and living environment?
temp_mean%>%arrange(wc2.1_10m_bio_1_mean)%>%head(5)

6 References

Appelhans, Tim, Florian Detsch, Christoph Reudenbach, and Stefan Woellauer. 2023. “Mapview: Interactive Viewing of Spatial Data in r.” https://CRAN.R-project.org/package=mapview.

Boyle, Brad, Nicole Hopkins, Zhenyuan Lu, Juan Antonio Raygoza Garay, Dmitry Mozzherin, Tony Rees, Naim Matasci, et al. 2013. “The Taxonomic Name Resolution Service: An Online Tool for Automated Standardization of Plant Names.” BMC Bioinformatics 14 (1): 16. https://doi.org/10.1186/1471-2105-14-16.

Chirico, Michael. 2023. “geohashTools: Tools for Working with Geohashes.” https://CRAN.R-project.org/package=geohashTools.

Zizka, Alexander, Daniele Silvestro, Tobias Andermann, Josué Azevedo, Camila Duarte Ritter, Daniel Edler, Harith Farooq, et al. 2019. “CoordinateCleaner : Standardized Cleaning of Occurrence Records from Biological Collection Databases.” Edited by Tiago Quental. Methods in Ecology and Evolution 10 (5): 744–51. https://doi.org/10.1111/2041-210X.13152.

Vilela, Bruno, and Fabricio Villalobos. 2015. “letsR: A New r Package for Data Handling and Analysis in Macroecology” 6: –5. https://doi.org/10.1111/2041-210X.12401.

Tutorial of UltraGBIF

2026-04-17