- Provenance, Scientific Workflows, Databases, Big Data, eScience, Ontology, and 21 moreComputer Science, Big Data Analytics, Scientific Computing (Computational Science), Semantics, Linked Open Data, Open Government Data, Semantic Web, Data Mining, Data Provenance, Data origin forensics, Data Management, Dyslexia, Tdah, Artificial Neural Networks, Precision Agriculture, Agronomy, Agroinformática, Social Media, Meteorology, Data Warehousing, Database Systems, Scientific Workflows Scheduling, and Cloud Computingedit
Web services technology are among the advanced computing technologies that are available to support scientific applications, providing the underpinning to new levels of collaboration. This work presents a novel framework that harvests and... more
Web services technology are among the advanced computing technologies that are available to support scientific applications, providing the underpinning to new levels of collaboration. This work presents a novel framework that harvests and interconnects operational Web services provenance metadata with behavioral clickstream data to enhance trust on in silico scientific experiments.
Research Interests:
This article presents the ETL4LinkedProv approach to manage the collection and publication of provenance with distinct levels of granularity as Linked Data. The proposed approach uses ETL-workflows and a component named Provenance... more
This article presents the ETL4LinkedProv approach to manage the collection and publication of provenance with distinct levels of granularity as Linked Data. The proposed approach uses ETL-workflows and a component named Provenance Collector Agent to collect two kinds of provenance (prospective and retrospective) integrating them with domain data. The component also set the granularity of the provenance to be captured. Furthermore, ETL4LinkedProv is evaluated in a real world scenario where governmental Brazilian agencies produce and publish public data sources as Linked Data. In this article we also measure the amount of the provenance generated in the runtime of ETL-workflows and in the number of published RDF triples.
Research Interests:
Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are... more
Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are produced the importance of tracking and sharing its metadata grows. Therefore, it is desirable to make it easy to access, share, reuse, integrate and reason. To address these requirements ontologies can be of use to encode expectations and agreements concerning provenance metadata reuse and integration. In this paper, we present a well-founded provenance ontology named Open proVenance Ontology (OvO) which takes inspiration on three theories: the lifecycle of in silico scientific experiments, the Open Provenance Model (OPM) and the Unified Foundational Ontology (UFO). OvO may act as a reference conceptual model that can be used by researchers to explore the semantics of provenance metadata.
Research Interests: Bioinformatics, Science Education, Genomics, Metadata, Ontology (Computer Science), and 13 moreComparative Genomics, Provenance, Semantic Web technology - Ontologies, Semantic Web, Cloud Computing, eScience, OPM methodology, Scientific Workflow, Workflows, Data Provenance, Open Provenance Model (OPM), Unified Foundational Ontology, and Open proVenance Ontology
This paper explores the organization of provenance as a catalogue (catalog) of non-functional requirement. The aim of this paper is to introduce a systematic approach to design a provenance catalog for reuse using consolidated software... more
This paper explores the organization of provenance as a catalogue (catalog) of non-functional requirement. The aim of this paper is to introduce a systematic approach to design a provenance catalog for reuse using consolidated software engineering techniques. Provenance captures a derivation history of data products and is essential to the long-term preservation, to reuse, and to determine data quality. We propose the provenance catalog that took into account NFR patterns and provenance taxonomies and specifications to define its softgoals. This work depicts a novel approach on provenance describing it as a Softgoal Interdependency Graph, a reusable framework that makes explicit characterization, decomposition, relationships and operationalization of elements that can be satisfied with the software. We exemplify the approach in a real usage scenario based on scientific software development.
Research Interests:
In this paper we present an investigation of life event classification on social media networks. Detecting personal mentions about life events, such as travel, birthday, wedding, etc, presents an interesting opportunity to anticipate the... more
In this paper we present an investigation of life event classification on social media networks. Detecting personal mentions about life events, such as travel, birthday, wedding, etc, presents an interesting opportunity to anticipate the offer of products or services, as well to enhance the demographics of a given target population. Nevertheless, life event classification can be seen as an unbalanced classification problem, where the set of posts that actually mention a life event is significantly smaller than those that do not. For this reason, the main goal of this paper is to investigate different types of classifiers, on a experimental protocol based on datasets containing various types of life events in both Portuguese and English languages, and the benefits of over-sampling techniques to improve the accuracy of these classifiers on these sets. The results demonstrate that a Logistic Regression may be a poor choice to deal with the original datasets, but after over-sampling the training set, such classifier is able to outperform by a significant margin other classifiers such as Naive Bayes and Nearest Neighbours, which do not benefit as well from the over-sampled training set in most cases.
Research Interests:
The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in... more
The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in distributed environments. The management of provenance descriptors collected during the life cycle of scientific experiments may represent an important goal to be achieved. This thesis presents a new strategy which was focused to aid scientists to manage different kinds of provenance descriptors. It describes a computational approach that uses a well founded ontology named OvO (Open proVenance Ontology) and a provenance infrastructure entitled Matriohska that can be attached to scientific workflows executed on distributed and heterogeneous environments like the cloud of computers. The approach also allows scientists to further perform semantic queries on provenance descriptors with distinct types of granularity.
This thesis was indicated by PESC/COPPE/UFRJ and was awarded by the SAE (Strategic Affairs Secretary of the presidency of the Brazilian Republic ) as the best Brazillian PhD thesis in Computer Science in 2011
This thesis was indicated by PESC/COPPE/UFRJ and was awarded by the SAE (Strategic Affairs Secretary of the presidency of the Brazilian Republic ) as the best Brazillian PhD thesis in Computer Science in 2011
