Skip to main content
Web services technology are among the advanced computing technologies that are available to support scientific applications, providing the underpinning to new levels of collaboration. This work presents a novel framework that harvests and... more
Web services technology are among the advanced computing technologies that are available to support scientific applications, providing the underpinning to new levels of collaboration. This work presents a novel framework that harvests and interconnects operational Web services provenance metadata with behavioral clickstream data to enhance trust on in silico scientific experiments.
Research Interests:
Download (.pdf)
This article presents the ETL4LinkedProv approach to manage the collection and publication of provenance with distinct levels of granularity as Linked Data. The proposed approach uses ETL-workflows and a component named Provenance... more
This article presents the ETL4LinkedProv approach to manage the collection and publication of provenance with distinct levels of granularity as Linked Data. The proposed approach uses ETL-workflows and a component named Provenance Collector Agent to collect two kinds of provenance (prospective and retrospective) integrating them with domain data. The component also set the granularity of the provenance to be captured. Furthermore, ETL4LinkedProv is evaluated in a real world scenario where governmental Brazilian agencies produce and publish public data sources as Linked Data. In this article we also measure the amount of the provenance generated in the runtime of ETL-workflows and in the number of published RDF triples.
Research Interests:
Download (.pdf)
Brazil have a diversified agriculture with million small establishments. Brazilian agriculture faces a robust growth at its productivity despite its historical inequalities. Thus, we foresee several open opportunities to empower... more
Brazil have a diversified agriculture with million small establishments. Brazilian agriculture faces a robust growth at its productivity despite its historical inequalities. Thus, we foresee several open opportunities to empower smallholders to produce better goods based of the use of information technology (IT). In the project we focus on tomatoes because they are highly susceptible to diseases. Besides, the indiscriminate use of pesticides in tomato crops brings problems to human health and to the environment. We believe we may aid smallholders to reduce the chemical management of diseases, to grow their income levels and to deliver healthier products.
Research Interests:
Download (.pdf)
STEM is defined as learning in the fields of Science, Technology, Engineering and Mathematics. In Brazil, many students leave the educational system before achieving a tertiary degree in these fields. Poor academic performance in STEM... more
STEM is defined as learning in the fields of Science, Technology, Engineering and Mathematics. In Brazil, many students leave the educational system before achieving a tertiary degree in these fields. Poor academic performance in STEM undergraduate courses is an issue faced by many universities, both in developed and emerging countries. Although these universities store large amounts of data, there are few studies about educational data mining (EDM) software tools designed to aid educational managers in analyzing student learning and improving the quality of the courses. Our approach may assist managers in supervising students at the end of each academic term, thus enabling them to identify the students in difficulty of fulfilling the academic requirements toward a degree. This paper shows quantitative experimental studies using a large dataset of real data from five traditional STEM undergraduate courses of one of the largest public Brazilian universities. Finally, the results show that data mining algorithms can establish effective prediction models from existing student data.
Research Interests:
Download (.pdf)
Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. This work presents SisGExp, a provenance-based approach that aid... more
Reproducibility is a major feature of Science. Even agronomic research of exemplary quality may have irreproducible empirical findings because of random or systematic error. This work presents SisGExp, a provenance-based approach that aid researchers to manage, share, and enact the computational scientific workflows that encapsulate legacy R scripts. SisGExp transparently captures provenance of R scripts and endows experiments reproducibility. SisGExp is non-intrusive, does not require users to change their working way, it wrap agronomic experiments as a scientific workflow system.
Research Interests:
Download (.pdf)
Understanding the core function of the brain is one the major challenges of our times. In the areas of neuroscience and education, several new studies try to correlate the learning difficulties faced by children and youth with behavioral... more
Understanding the core function of the brain is one the major challenges of our times. In the areas of neuroscience and education, several new studies try to correlate the learning difficulties faced by children and youth with behavioral and social problems. This work aims to present the challenges and opportunities of computational neuroscience research, with the aim of detecting people with learning disorders. We present a line of investigation based on the key areas: neuroscience, cognitive sciences and computer science, which considers young people between nine and eighteen years of age, with or without a learning disorder. The adoption of neural networks reveals consistency in dealing with pattern recognition problems and they are shown to be effective for early detection in patients with these disorders. We argue that computational neuroscience can be used for identifying and analyzing young Brazilian people with several cognitive disorders.
Research Interests:
Download (.pdf)
The analysis of increasing flow of data about Tropical rainfall is a big challenge faced by meteorologists. This work presents a semantic approach that uses well-founded ontologies that help meteorologists to develop SPARQL queries that... more
The analysis of increasing flow of data about Tropical rainfall is a big challenge faced by meteorologists. This work presents a semantic approach that uses well-founded ontologies that help meteorologists to develop SPARQL queries that navigate over high-quality data and provenance metadata collected during the execution meteorological in silico experiments.
Research Interests:
Download (.pdf)
Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support... more
Scientific Workflows are abstractions used to model in silico scientific experiments. Cloud environments are still incipient in collecting and recording prospective and retrospective provenance. This paper presents an approach to support collecting metadata provenance of in silico scientific experiments executed in public clouds. The strategy was implemented as a distributed and modular architecture named Matriohska. This paper also presents a provenance data model compatible with PROV specification. We also show preliminary results that describe how provenance metadata was captured from the components running in the cloud.
Research Interests:
Download (.pdf)
Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are... more
Provenance is a term used to describe the history, lineage or origins of a piece of data. In scientific experiments that are computationally intensive the data resources are produced in large-scale. Thus, as more scientific data are produced the importance of tracking and sharing its metadata grows. Therefore, it is desirable to make it easy to access, share, reuse, integrate and reason. To address these requirements ontologies can be of use to encode expectations and agreements concerning provenance metadata reuse and integration. In this paper, we present a well-founded provenance ontology named Open proVenance Ontology (OvO) which takes inspiration on three theories: the lifecycle of in silico scientific experiments, the Open Provenance Model (OPM) and the Unified Foundational Ontology (UFO). OvO may act as a reference conceptual model that can be used by researchers to explore the semantics of provenance metadata.
Research Interests:
Download (.pdf)
This paper explores the organization of provenance as a catalogue (catalog) of non-functional requirement. The aim of this paper is to introduce a systematic approach to design a provenance catalog for reuse using consolidated software... more
This paper explores the organization of provenance as a catalogue (catalog) of non-functional requirement. The aim of this paper is to introduce a systematic approach to design a provenance catalog for reuse using consolidated software engineering techniques. Provenance captures a derivation history of data products and is essential to the long-term preservation, to reuse, and to determine data quality. We propose the provenance catalog that took into account NFR patterns and provenance taxonomies and specifications to define its softgoals. This work depicts a novel approach on provenance describing it as a Softgoal Interdependency Graph, a reusable framework that makes explicit characterization, decomposition, relationships and operationalization of elements that can be satisfied with the software. We exemplify the approach in a real usage scenario based on scientific software development.
Research Interests:
Download (.pdf)
In this paper we present an investigation of life event classification on social media networks. Detecting personal mentions about life events, such as travel, birthday, wedding, etc, presents an interesting opportunity to anticipate the... more
In this paper we present an investigation of life event classification on social media networks. Detecting personal mentions about life events, such as travel, birthday, wedding, etc, presents an interesting opportunity to anticipate the offer of products or services, as well to enhance the demographics of a given target population. Nevertheless, life event classification can be seen as an unbalanced classification problem, where the set of posts that actually mention a life event is significantly smaller than those that do not. For this reason, the main goal of this paper is to investigate different types of classifiers, on a experimental protocol based on datasets containing various types of life events in both Portuguese and English languages, and the benefits of over-sampling techniques to improve the accuracy of these classifiers on these sets. The results demonstrate that a Logistic Regression may be a poor choice to deal with the original datasets, but after over-sampling the training set, such classifier is able to outperform by a significant margin other classifiers such as Naive Bayes and Nearest Neighbours, which do not benefit as well from the over-sampled training set in most cases.
Research Interests:
Download (.pdf)
The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in... more
The continuous quest for knowledge stimulates companies and research institutions not only to investigate new ways to improve the quality of scientific experiments, but also to reduce the time and costs needed for its implementation in distributed environments. The management of provenance descriptors collected during the life cycle of scientific experiments may represent an important goal to be achieved. This thesis presents a new strategy which was focused to aid scientists to manage different kinds of provenance descriptors. It describes a computational approach that uses a well founded ontology named OvO (Open proVenance Ontology) and a provenance infrastructure entitled Matriohska that can be attached to scientific workflows executed on distributed and heterogeneous environments like the cloud of computers. The approach also allows scientists to further perform semantic queries on provenance descriptors with distinct types of granularity.
This thesis was indicated by PESC/COPPE/UFRJ and was awarded by the SAE (Strategic Affairs Secretary of the presidency of the Brazilian Republic ) as the best Brazillian PhD thesis in Computer Science in 2011
Research Interests:
Download (.pdf)
Le Brésil possède une agriculture diversifiée, composée de millions de petits établissements agricoles. L'agriculture brésilienne connaît un rythme de croissance robuste en dépit de ses inégalités historiques. Ainsi, nous prévoyons... more
Le Brésil possède une agriculture diversifiée, composée de millions de petits établissements agricoles. L'agriculture brésilienne connaît un rythme de croissance robuste en dépit de ses inégalités historiques. Ainsi, nous prévoyons plusieurs opportunités ouvertes d'habiliter les petits agriculteurs à produire de meilleurs produits à partir d'une utilisation des technologies de l'information (TI). Dans le projet, nous nous concentrons sur les tomates car elles sont très sensibles aux maladies et à la contamination. En outre, l'utilisation aveugle de pesticides dans les cultures de tomates pose de sérieux problèmes dans la santé humaine et l'environnement. Nous croyons que la proposition peut aider les petits exploitants à réduire la gestion chimique des maladies, à accroître leurs revenus et à offrir des produits plus sains. Ce projet présente une approche axée sur l'amélioration de la qualité des cultures de tomates.
Research Interests:
Download (.pdf)