Editorial - Big Data
The extreme V’s are sometimes augmented by little v’s – visualisation, value & privacy, representing the need to interactively explore the raw data and see the solution in the context of the relevant raw data; and to harness societal and individual value that outweighs the intrusion in all our personal lives that comes with contributing a constant stream of information from those ever-increasing pool of connected digital devices we carry about with us and have in our homes and offices. Big Data analytics has been around for a decade or two and in some disciplines is reaching maturity. What we are learning from this experience is that radically rethinking traditional analysis techniques to solve the technical issues presented by extreme V analytics can deliver substantial everyday spinoffs that are useful both for our science and for our clients.
These spinoffs are characterised by increased agility in addressing multi-disciplinary problems, ease of scaling solutions from experimental to operational use, reducing the iteration time to test and refine multiple hypotheses and scenarios, greater clarity and transparency in treatment of scientific uncertainty and communicating it as societal or decision risk, and by greater ease for operating in a shared space across agencies or for partnerships between science, society, industry or government. With society’s urgent need to address the issues of climate change, the economic transformation of moving to a low carbon economy, increased frequency of extreme climatic events requiring infrastructure resilience and disaster response, environmental prevalence of micro- and nano-particles of plastics and chemical residues in the environment and the consequences for human health, these spinoffs all translate into both benefits for science and benefits for society.
Less immediately visible is the growing suite of international standards that are essential to delivering on the Big Data promise. Manaaki Whenua is active in this space too. We are recognised for our leading contributions to standards for representing and exchanging both digital soil information (SoilML) and digital time-series information (TimeSeriesML), and a new digital first way to record global location and geographic data of all types (DGGS).
SoilML and TimeSeriesML address the need to exchange complex information efficiently and clearly without loss of meaning and without the need for humans to interpret the data – so that computers can reason with the data unambiguously. This is obviously an essential ingredient of Big Data, as nothing would slow an analysis more than the need for humans to have to audit everything that was coming into the analysis.
DGGS (Discrete Global Grid Systems) are an alternative to the system of Latitude and Longitude and country-specific, flat-earth map projections, such as NZ Transverse Mercator (NZTM). DGGS are designed from the ground up for meeting the needs of location and spatial data in Big Data analysis. The governments of Canada, Australia, China, and the UK are investing very significantly in DGGS. In industry Google and Uber are the most well-known private companies that have recently chosen to use DGGS at the core of their business systems. Earlier this year, Uber chose to release its DGGS software code (H3) as Open Source – so it is gaining traction outside Uber as well as internally, where it is used to track all its drivers and passengers and manage its variable trip pricing model. Here in New Zealand, Statistics NZ and LINZ, helped by Manaaki Whenua, are just starting a DGGS pilot to get some hands-on experience so that they can understand the role DGGS could play as a NZ standard for exchanging diverse types of spatial data between agencies to facilitate government decision making.
While none of the topics described in this Big Data issue can be classified as true Big Data, they are all examples of the emergence of different aspects of extreme V solutions.