Prof. Reinaldo Rosa
National Institute for Space Research, São José dos Campos, Brazil
(pdf of presentation is available here)
Intensive analysis of large data sets from advanced research in astrophysics and cosmology deals with amounts of data flow greater than 1TB/h (Big data workload). In this talk we will discuss, within the astronomy scenario, how to improve performance for digital image analysis in the context of Data Science. Performing an intensive morphometric analysis of digital images obtained from the SDSS projects we propose a quantitative balance between hardware and database algorithms that is able to optimize the analytical performance using heterogeneous computing (based on the general purpose manycores technology) with solutions from NoSQL approach. Furthermore, this heterogeneous computing solution allows resorting to the Machine Learning paradigms for reliable automation in the realization of the most important analytical tasks as classification and pattern recognition of structural information. In this framework, the minimum automatic heterogeneous architecture (which we call MAHA) solution should provide the lowest energy consumption as a determinant of the HPC system. As a highlight of this study we show that the performance of a generic MAHA depends only on four main variables: amount of cores, number of threads per core, the percentage of parallelized workload, and the energy flux efficiency, even considering the data assimilation and validation of models as 2nd order tasks.
Created / Updated: 22 September 2016 / 7 August 2018