Data Federation
In this section
This critical component of the IDA project brought together existing, heterogeneous spatial data from multiple sources relating to soil quality, land use, and species occupancy to produce a suite of higher value information products. The research involved looking at techniques for data harmonisation and integration, creating related data processing pipelines, implementation of data validation tools and mechanisms to report on data quality and data provenance. Technologies for integrating distributed data were also investigated.
The work identified issues such as how data are currently collected, e.g. scale, coverage, and frequency of collection, data quality, how data are managed, data ownership and related data access problems. These factors significantly affect the automation of data harmonisation and integration in the 3 domains considered in the research programme.
The findings from the work undertaken in this component of the IDA programme informed the contribution by IDA staff to an Our Land and Water National Science Challenge think piece -- A Data Ecosystem for Land and Water Data to Achieve the Challenge Mission (2016).
Land Use
Land use is difficult to map as there is no single dataset that can easily describe how we use the land. We successfully reconstructed three national land use classifications (namely for PLUTs, LUNRZ, and LUNZ) by developing ways to record the workflow for the creation of a land use map.
Land use is a crucial input information for a variety of reasons. Our geospatial work to produce land use information was used for various projects, e.g. mapping the extent of artificial drainage in New Zealand, an MPI project on producing a framework for pasture quality and to provide advice around collating information on artificial drainage. We also worked in partnership with the Ministry for the Environment to explore various spatial layers and create a grassland classification layer for the Land Use Map in the LUCAS programme for greenhouse gas inventory reporting purposes.
We are now collaborating with Our Land and Water National Science Challenge to produce a land use map that will serve to estimate impacts of human activities on river water quality in the sources and flows programme.
Because of the complexity of combining heterogeneous sources of information on land use or land cover, we developed a platform-independent technology, pyLUC, that enables automatic generation of LU classifications using the latest source data, to a high standard of reliability, and on demand according to end-user needs. pyLUC is a geospatial data-processing framework aimed at complementing standard geographic information system (GIS) tools to generate spatial land use classifications (LUCs) but with increased levels of transparency and repeatability.
pyLUC applies logic written in Python to existing spatial datasets to create new spatial datasets and automatically produces fine-grained provenance data exactly describing all sources and processing steps for the datasets produced (see Supporting IDA component for more information on pyluc and provenance). The idea is that anyone (with appropriate access permissions to the input datasets) can be handed a pyLUC definition script they can then use to reproduce the output dataset in full. pyLUC uses versioned input data available from authoritative instances on the Koordinates geospatial data warehouse platform, e.g. the LRIS Portal, increasing the transparency of the process, but can be extended to access data from other data source end points, e.g. geospatial web services.
Provenance capture was also retro-fitted in LUMASS (Land-Use Management Support System) and tested on an application of SedNetNZ.
The work on data provenance will be published in a forthcoming journal article.
The land use information created by IDA cannot be released for wider use because the data are derived from commercial input data.
Soil Quality
Soil quality data have been collected in the past but with inconsistencies in monitoring methods and discontinued programmes of soil data collection (e.g. the 500 soils programme ran from 1999 to 2001). These inconsistencies are common issues across the globe, and contribute to less than optimal State of the Environment (SoE) reporting in New Zealand. The IDA team contributed to international efforts to save soil legacy data via the Global Soil Map and the World Soil Information Service (WoSIS) project (Arrouays et al. 2017) and involvement in the FAO Global Soil Partnership Programme.
The IDA programme contributed to standardising and collecting heterogeneous sources of soil quality information by creating a pipeline to ingest, harmonise, and validate several key soil data sources, including the 500 Soils Database of national soil quality monitoring and the Land Use and Carbon Analysis System (LUCAS) soil data, into Manaaki Whenua's National Soil Data Repository. Both these databases are used to support NZ’s international climate change reporting to the UN Framework Convention on Climate Change. As a consequence of the challenges uncovered in trying to harmonise soil data, IDA staff were asked to define a site naming scheme and associated site identification system for environmental monitoring (Ritchie & Osorio-Jamarillo 2017).
IDA staff contributed to a review of soil quality and a trace elements stocktake to improve national consistency of SoE reporting (including soil quality and trace element monitoring, and data management). (Cavanagh et al. 2017).
As part of the Land Domain 2018 report, a comprehensive summary of soil quality data has been collected and submitted, using a peer-reviewed reproducible workflow to ensure data quality assurance.
Since we know that a complete soil-quality monitoring programme is lacking in New Zealand, we assessed coverage and representativeness of current soil quality monitoring sites as an indication of the necessary direction of future soil sampling strategies in NZ (Cavanagh et al. 2017).
Biodiversity
The Biodiversity component of IDA was focused on improving primary biodiversity data use for species modelling and specifically for supporting indicators for species occupancy.
IDA contributed to improved data access, data integration and quality control of primary biodiversity data. The work plan was designed primarily to support existing initiatives.
New Zealand is a signatory to the intergovernmental Global Biodiversity Information Facility (GBIF). The aim of GBIF is to provide free and open access to biodiversity data through a standards-based data-sharing network.
The IDA project contributed to establishing robust, on-demand data-access to the data held in several national biological collections. Data are harvested both by GBIF and the Australasian Virtual Herbarium, as part of the Atlas of Living Australia).
We developed enhancements to the New Zealand Organisms Register (NZOR) for resolving issues about taxonomic names and synonyms that are also used as part of the internal data-ingestion and integration pipeline within NZOR.
We developed new systems for geo-validation and geo-coordinate translation. (Not publicly available)
We contributed to the rOpenSci community development of software packages for biodata-processing and we also developed a customised data-integration re-useable pipeline for processing non-public data from the National Vegetation Survey Databank (NVS), the threatened species data from DOC’s BioWeb database, and the iNaturalist Citizen Science platform.
Outputs from this IDA component included re-developed, data-access end-points for biological collections/databases and IDA -- a myrtaceae pipeline (public access data only) implemented using the GBIF Integrated Publishing Toolkit.
References
Ausseil AG, Manderson A, Rutledge D, Wyman T 2015. Land use mapping review for Ministry for the Environment. Landcare Research Contract Report LC2356 for Ministry for the Environment.
Jolly B, Müller M, Medyckyj-Scott, Spiekermann R, Ausseil A-G. 2017. A tool for the repeatable generation and automated documentation of landuse classification maps. Presentation given at American Geophysical Union.
Manderson A. 2018. Mapping the extent of artificial drainage in New Zealand. Manaaki Whenua Contract Report LC3325. Prepared for Lincoln Agritech
Manderson AK, Jolly B, Ausseil A-GE 2018. Land Use Classification replicator.
Rutledge D, Manderson A, Lilburne L, Ausseil AG, Belliss S, Price R. 2016. Methodology for a GIS-based land-use map for Southland: a review. Landcare Research Report LC2491 for Environment Southland.