ICACT20220341 Slide.16
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
Thank you very much for listening.
|
ICACT20220341 Slide.15
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
This is the references of our paper.
|
ICACT20220341 Slide.14
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
Let me just run over the key points again. In this paper, we explained the design of MMHS in SODAS platform for better data and metadata exchange. Firstly, we introduced briefly the overview of SODAS platform. Secondly, we designed the three main function blocks of MMHS in detail. Thirdly, through two specific scenarios, we demonstrated that the described system can alleviate the consequences of different standards or structures for metadata and data management. Moreover, MMHS not only helps users to access easily their desired metadata and data collected from different external systems but also lets users manage effectively the harvesting process
|
ICACT20220341 Slide.13
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
This is the conclusion section.
|
ICACT20220341 Slide.12
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
The procedure for stopping the information harvesting process is much simpler. In the first step, user sends a request to Job Manager for obtaining a list of running jobs. User then selects a job Jn and sends a request to Job Manager for stopping the information harvesting process. Based on the job information in database, the Job Manager checks whether there is any invalid information in the request or not. After that, Job Manager asks Queue Manager to delete the metadata queue and data queue related to job Jn. Job Manager also asks Queue Manager to add job Jn to DELETED_JOB_QUEUE queue. Object Collector gets job Jn from DELETED_JOB_QUEUE queue, stops processing job Jn and continues to process the other jobs in JOB_QUEUE. Metadata Importer gets job Jn from DELETED_JOB_QUEUE queue and stops processing the metadata related to job Jn and continues to process the metadata of other jobs. Data Downloader gets job Jn from DELETED_JOB_QUEUE queue and stops processing the data related to job Jn
|
ICACT20220341 Slide.11
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
This is the scenario for starting the information harvesting process in our system. In the first step, user sends a request to Source Manager for obtaining a list of registered sources. User then selects one source and sends a request to Job Manager for starting the information harvesting process of that source. Based on the job information in database, the Job Manager checks whether there is any invalid information in the request or not. After that Job Manager asks Queue Manager to create a new meatadata queue and data queue for job Jn and add job Jn into the JOB_QUEUE queue. Object Collector gets job Jn from JOB_QUEUE queue, asks Config Manager for configuration information and Source Manager for source information. Object Collector then sends requests to the external system for obtaining a list of data¡¯s metadata and asks Queue Manager to add the items of this list into the metadata queue of job Jn. Metadata Importer gets metadata Mn from metadata queue and performs mapping the properties of Mn with the standard based on the configuration information and store the mapped Mn into database. Metadata Importer then asks Queue Manager to add the items of metadata list which include direct data content download endpoints to the data queue for job Jn. Finally, Data Downloader gets metadata dn from queue, performs downloading dn and stores the result into the appointed storage system in Data Storage.
|
ICACT20220341 Slide.10
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
Firstly, in API block, there are four function blocks: Authentication/Authorization, Config Manager, Source Manager, and Job Manager. More specifically, Config Manager is responsible for managing the configuration information such as default data storage, mapping rules from different standards to the datahub¡¯s standard. Source Manager is responsible for managing the source information such as name, access endpoint, and standard type. Job Manager is used for managing the job information such as started time, ended time, and current status
Secondly, Handler block consists of four function blocks: Authorizer, Cleaner, Elastic Indexer, and Queue Manager. While the first three blocks are mainly responsible for improving security and performance, Queue Manager is used to manage the queue information in Data Queue for interconnection.
Thirdly, there are three function blocks in Harvester block: Object Collector, Metadata Importer, and Data Downloader. In particular, Object Collecting takes charge of requesting the metadata from external systems based on the registered source information. Metadata Importer is responsible for mapping and storing the metadata into database while Download Importer is responsible for downloading and storing the data content. It means that database is not only responsible for storing the internal metadata such as source, jobm and config but also for storing the metadata collected from external systems.
|
ICACT20220341 Slide.09
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
In this section, we describe in detail the entire architecture of MMHS and explain how each function block works tother in two specific scenarios
|
ICACT20220341 Slide.08
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
We have developed several components such as Data Governance, Data Portal, and Datamap Publisher to let users interact with the platform more conveniently. Note that, each component has been created by connecting more than one module including Reference Model, Community, Gateway, Data Management, Datamap, Data Discovery, Commerce, Devops, AI Pipepline, Datahub, and System Infrastructure. Among these modules, our MMHS belongs to Datamap which is mainly used to support metadata management, harvesting, and publishing functions
|
ICACT20220341 Slide.07
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
The purpose of SODAS is to become the all-in-one platform targeting both end-users and developers. Therefore, at the beginning, SODAS has been designed based on many industry standards such as W3C DCATv2, OpenAPI Specification.
Moreover, by utilizing several popular open-source software, we have developed and organized the main functions of SODAS into different modules to guarantee the extendibility and availability of the platform.
For developers, according to their specific demands, the plug-in architecture of SODAS lets the function and module development become much easier and faster. For end-users, due to multitenancy support, different users with different roles from different organizations can together use SODAS without any restrictions
|
ICACT20220341 Slide.06
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
This section focuses on introducing the overview of SODAS platform
|
ICACT20220341 Slide.05
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
However, because each of platforms may follow totally different standards and structures, when we need to collect information from the sites built by these platforms, it becomes complicated and costly.
It means that various metadata and data standards are not considered significantly, causing many difficulties in extending functions and collecting information
Also, there is no clear model structure for storing and querying a set of data collecting processes. As a result, it is difficult to manage effectively the collected metadata and data
That¡¯s why we propose here a new system named MMHS (Metadata Management and Harvesting System) used in our SODAS (Smart Open Data As a Service) platform.
|
ICACT20220341 Slide.04
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
Many of you may have heard about open data or something related to ckan before. So let me introduce very briefly here.
Open data means data that can be freely used and can be redistributed without restrictions.
In Open Data Platform, there are many technologies for storing, managing and sharing open data
There are many open-source platform such as: CKAN, OGPL, Drupal, and commercial platform such as Socrata, junar
|
ICACT20220341 Slide.03
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
The first section is Introduction
|
ICACT20220341 Slide.02
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
This is the content of my presentation today
In the first section I will introduce some background knowledge about our work. Then I will talk about the main motivation and goals of our paper.
I then explain the overall architecture and main features of our SODAS platform.
After that I will introduce the detailed design of Metadata Management & Harvesting System in SODAS platform
Then I summarize our presentation in conclusion section
Final section will be the question and answer.
|
ICACT20220341 Slide.01
[Big slide for presentation]
[YouTube] |
Chrome Text-to-Speach Click!! |
 |
Hi everyone and welcome to my presentation. First of all, I would like to thank you all for coming here today. Let me start by saying just a few words about my own background. My name is Minh Chau Nguyen, a senior researcher from ETRI in Korea. My research interests are big data management, software architecture, and distributed systems.
I¡¯m here today to present our paper with the name ¡°Metadata Management and Harvesting System in Smart Open Data As a Service¡±.
In this paper, we focus on designing and implementing a new metadata management system in the platform named SODAS for efficient data sharing and utilization.
This research is collaboration works of CybreBrain Laboratory of ETRI with more than 5 companies and universities in the project named ¡°Core Technology Development for Intelligently Searching and Utilizing Big Data based on DataMap¡±
|