Phi Week 2019 - Research Infrastructure (4)

Abstracts Search

Programme

Research Infrastructure (4)

Session Chair

Christian Briese , EODC

Domenico Grandoni , e-geos

Friday, September 13, 2019

11:00 - 12:00

Details

Oral Presentations 12 minutes; Questions & Answers 3 minutes.

Speaker

Oral Presentation

Creating an End-to-End SAR Deep Learning Pipeline

Jayanti Sharma, MDA, Richmond, Canada

Darren Thomson¹, Anne Webster¹, Troy Yee¹, Christopher Bow¹, Sebastien Tremblay-Johnston¹, Willis Peng¹, David Dobre¹, Ron Caves¹, Hans Wehn¹
¹MDA, Richmond, Canada

Show Hide Abstract

Spaceborne Synthetic Aperture Radar (SAR) is well-suited for broad area mapping and monitoring for a range of applications including maritime surveillance, detecting new infrastructure and changes in land cover, and gathering geospatial intelligence. The availability of SAR constellations such as Sentinel-1 provide moderate to high spatial resolution imagery with a short revisit, generating rich and dense time series for data exploitation. Canada’s upcoming three satellite RADARSAT Constellation Mission (RCM) will provide enhanced operational use of SAR data for maritime surveillance, disaster management and environmental monitoring, and will further increase SAR data volumes. To better utilize the large and ever growing archives of spaceborne SAR data, there is an increasing need for effective and automated techniques to exploit the massive datasets SAR missions provide.

Artificial Intelligence (AI) and specifically deep learning (DL) have the potential to automate SAR analysis tasks, especially for mapping wide areas and for monitoring patterns of activity on a regular basis. Deep learning networks use multiple layers of nonlinear processing to learn complex patterns in the data, offering the potential for human-level performance or better on SAR exploitation tasks. Canada has emerged as a leader in AI with a strong history of contributions from researchers such as Geoffrey Hinton (University of Toronto) and Yoshua Bengio (Université de Montréal). Canada continues to be a hotspot for AI research with a growing ecosystem of contributions from government, universities, established companies (Google, Microsoft, etc.) and start-ups. As the major Canadian developer of SAR missions and provider of derived information services, MDA is keen to leverage these recent advances in AI to better exploit SAR imagery.

In this context, MDA is applying deep learning techniques to SAR and other remotely sensed data to increase automation, to increase data throughput, and to develop new applications and products. However, as is widely recognized, modelling is only a small part of an AI solution, where infrastructure to support data ingest, preparation of training data and storage of results is required for a reliable, scalable and reproducible deep learning system.

Over the last year, MDA has been defining and implementing an end-to-end deep learning pipeline to facilitate the development and use of DL techniques for remotely sensed images. Our design emphasizes the following characteristics:
• Leverages open source software with permissive commercial licenses.
• Is able to be deployed cross platform, including both on-premise and in the cloud.
• Integrates state-of-the-art deep learning frameworks with a focus on TensorFlow.
• Is computationally efficient and allows the processing of very large remotely sensed images.
• Enables analysts to focus on developing and testing deep learning models rather than on manually configuring inputs and outputs.
• Creates reproducible datasets, models and results.

Main components of the pipeline include:
• Image and ground truth ingest and pre-processing to create Analysis Ready Data. For SAR data, in particular time series of data, this may include image cropping, co-registration, speckle filtering and geocoding. We have created a Python wrapper library to allow us to chain together proprietary algorithms to perform these basic sub-tasks for RADARSAT-2 imagery, with the capability to extend to additional sensors including RCM and Sentinel-1.
• An image database (DB) to store satellite imagery and relevant metadata. The image DB leverages OpenDataCube (ODC), an open-source library for cataloguing satellite imagery. The image DB provides consistent image formats for downstream pipeline components and allows users to query across generic acquisition parameters (e.g. date range, area of interest) and sensor-specific parameters (e.g. detected or complex data for SAR, the type of pre-processing applied or desired radar polarizations).
• A labelling tool and accompanying label database to manually annotate and store labelled training data. We have extended an open-source labelling tool called CVAT (Computer Vision Annotation Tool) to support the labelling of large, geospatial time series of data. This includes breaking up large images into tiles for labelling, and auto-generating initial labels for time series of data to reduce annotation time. For SAR data, imagery is rescaled from radar backscatter with a large dynamic range to decibel-scaled 8-bit imagery that is more easily interpretable.
• Reproducible dataset creation of training image chips and corresponding labels through interaction with the image and label databases. We are working to support various task types including object detection, chip classification and semantic segmentation. Users can split data into train, validation and test sets for hyperparameter tuning and unbiased model evaluation.
• A deep learning training module that accepts records from the dataset creation module and model definitions and hyperparameters from the user to create and run experiments. Models are written using TensorFlow, a high-performance numerical library for training and inference of deep neural networks that can scale to high performance computing architectures such as GPUs.
• A deep learning database stores the outputs of each experiment including trained models, accuracy metrics, and hyperparameters. Consistent information on each experiment is collected, improving reproducibility and allowing users to perform meta-analysis on series of experiments run over a period of time or across projects.

Throughout the pipeline we are making use of a technology stack including Docker, Django, PostgreSQL and REST APIs. The labelling tool and the image, label and deep learning databases are each being designed as a service with an external facing REST API providing convenient access via HTTP requests. Applications are deployed in Docker containers to simplify deployment across multiple environments. The pre-processing, dataset creation and deep learning training modules are deployed as Python libraries that can easily be imported and called by users. The majority of the pipeline is built using Python, as this is the primary programming language used for data science, and it provides a good fit with our existing machine learning workflows.

The deep learning pipeline allows for fast prototyping and evaluation of deep learning models in the MDA R&D group. Examples of applying the pipeline for object detection (e.g. ships, aircraft, buildings) and segmentation tasks (e.g. land cover and road extraction) will be presented using both SAR and Electro-Optical data.

Going forward we will continue to add more functionality to the pipeline including support for additional sensors, new tasks (e.g. instance segmentation) and baseline deep learning architectures exploiting the spatial, spectral and temporal dimensions of remotely sensed data.

Oral Presentation

Forestry TEP Enables EO Service Providers to Boost Their Operations

Renne Tergujeff, VTT Technical Research Centre of Finland Ltd, Finland

Tuomas Häme¹, Yrjö Rauste¹, Jukka Miettinen¹, Clive Farquhar², Peter van Zetten², Frank Martin Seifert³, Katri Tegel⁴, Tuukka Castrén⁵, Jussi Rasinmäki⁶, Pietro Ceccato⁷, Lauri Häme⁸, Sudip Kumar Pal⁹
¹VTT Technical Research Centre of Finland Ltd, Finland, ²CGI IT UK Ltd, UK, ³European Space Agency, Italy, ⁴Arbonaut Ltd, Finland, ⁵World Bank, USA, ⁶Simosol Oy, Finland, ⁷Spacebel s.a., Belgium, ⁸Terramonitor, Finland, ⁹Wuudis Solutions Oy, Finland

Show Hide Abstract

Six companies are currently pursuing to step up their business in providing services in the forestry sector, via integrating their processes based on Earth Observation data to the Forestry TEP platform.

The Forestry Thematic Exploitation Platform (Forestry TEP) enables efficient use of satellite data in generation of value-added forest information products. For all users, the cloud-hosted platform offers a massive catalogue of co-located optical and radar data, primarily from the Copernicus Sentinel satellites, and on-line processing services for value added products such as vegetation indices, forest and land cover maps and change maps. Popular tools such as QGIS and SNAP are also available via the web browser.

Key feature of Forestry TEP is its function as a platform for creating, using and sharing new processing services, via exploiting its online Docker-based development environment. The developer can test and use the new service privately, share it to selected partners, or make it public to all - as free-to-use or with commercial terms.

In 2019, six companies in particular are integrating their services on Forestry TEP, with on-boarding support provided by the platform. Each company is looking to benefit from the platform’s capabilities, such as performing processing close to data and the ability to develop own processing services. The individual use cases are:
• Arbonaut - performing forest and land cover classification based on combined optical imagery and radar data time series, particularly in developing countries;
• LTS International and the consortium of the SMFM project funded by the World Bank and the Global Environment Facility (GEF) - developing multiple tools for dry tropical forest monitoring, particularly in Mozambique and Zambia;
• Simosol - integrated workflow for satellite data screening, pre-processing and full processing for various forest mapping and analysis services;
• Spacebel - developing cross-platform integration of services and data between the EO Regions! platform and the Forestry TEP;
• Terramonitor - large-scale processing of Sentinel-2 and Sentinel-3 data in production of dynamically updating land map with global coverage;
• Wuudis - acquiring satellite based forest variable estimates to an online, multi-source information service for forest management, in Finland and elsewhere.

Each on-boarding case presents their specific, partially new requirements to the platform. This is a planned opportunity for Forestry TEP, as the concurrent development of the platform to match these challenges facilitates responding to similar user needs ahead. A key functionality already implemented this year is batch processing, which has enabled an easy way for all processing services to support arbitrarily high numbers of inputs. Other new features in development include user workspace for data management and the ability to define automated, recurring processing schemes.

Forestry TEP was originally developed during 2015-2018 in a project funded by the European Space Agency (ESA), who currently supports the on-boarding activities and the associated development described above. VTT in Finland is leading the platform operations, in cooperation with CGI in UK as the systems integrator, both jointly maintaining and developing the platform further and supporting the users. The platform is currently deployed on the CREODIAS infrastructure, one of the Data Information and Access Services (DIAS) vendors, who provide the underlying facilities for data access, processing and storage.

Oral Presentation

TOP: Your Playground for Copernicus Atmospheric Sciences Data

Stefano Natali, Sistema Gmbh, Vienna, Austria

Clemens Rendl¹, Gerhard Triebnig², Daniel Santillan², Marcus Hirtl³, Barbara Scherllin-Pirscher ³, Christian Retscher⁴
¹Sistema Gmbh, Vienna, Austria, ²EOX IT Services GmbH, Vienna, Austria, ³ZAMG, Vienna, Austria, ⁴ESA ESRIN, Frascati, Italy

Show Hide Abstract

Atmospheric Earth Observation (EO) approaches the top of the big data era with an already huge amount of existing and a large volume of expected data. Considering only European missions, heritage data include ESA ERS1/2, Envisat, and EUMETSAT Meteosat data. On-going missions include Copernicus Sentinel 5P, the ESA Earth Explorer mission ADM-Aeolus, EUMETSAT Meteosat SG, and Metop. Future sensors and missions include Copernicus Sentinel 4 and Sentinel 5, the ESA Earth Explorer EarthCare, and EUMETSAT-managed missions such as Meteosat Third Generation (MTG), Metop Second Generation (Metop-SG). Billions of products, various petabytes of data, are waiting to be discovered and exploited. To this scenario, data from U.S., Japanese, Korean, Chinese Russian and Indian missions can also be added.
Broadening the understanding of geospatial data for atmospheric sciences, numerical model output (e.g., Copernicus Atmospheric Services (CAMS) products, simulations from meteorological services) as well as non-satellite-based remote sensing data (from ground-based fixed and mobile stations, balloons, drones, aircraft, HAPS, ...) shall also be included.
TOP is the first operational environment that allows exploiting simultaneously the data triangle, namely satellite-based data, model output, and non-satellite-based remote sensing data for atmospheric-science applications. Currently TOP allows accessing and exploiting CAMS data at global and regional EU scale, Sentinel 5P level 2 data, and air quality data retrieved from the European Environmental Agency (EEA). Besides the currently available data, other data collections can be added to improve the platform´s data offer and its functionalities.
Past data are made available together with real-time and forecast data, to give the user the possibility to develop new algorithms, calibrate and validate them through ground measurements, and test them on forecast data if needed. Since TOP relies on DIAS (Mundi DIAS currently) for data collection, data are always updated and maintained.
TOP pursues the concept of Virtual Research Environment (VRE): it relies on a robust data access layer (ADAM) and offers a fast and elegant graphic user interface (DAVE) for data visualization and comparison as well as a Jupyter Notebook interface for algorithm developments and validation. Furthermore, a set of (REST) APIs are provided through which users can directly connect local or virtual machines to avoid the time-consuming task of data collection and preparation. Both original data stored on the DIAS facility and on-the-fly user-defined multi-dimensional subset and processing provided by TOP are available to users.
The present work will introduce the TOP concept, the data offer, application capabilities and cooperation models.

Discussion

Research Infrastructure (4)

Session Chair

Details

Speaker

Creating an End-to-End SAR Deep Learning Pipeline

Show Hide Abstract

Forestry TEP Enables EO Service Providers to Boost Their Operations

Show Hide Abstract

TOP: Your Playground for Copernicus Atmospheric Sciences Data

Show Hide Abstract

Discussion Research Infrastructure