Cloud-agnostic architectures for machine learning based on Apache Spark.

The work “Cloud-agnostic architectures for machine learning based on Apache Spark“, with the support of NEANIAS project, has been published in “Advances in Engineering Software” Journal.

  • Authors: Enikő Nagy Róbert Lovas István Pintye Ákos Hajna lPéter Kacsuk.
  • Affiliation: Institute for Computer Science and Control (SZTAKI), Eötvös Loránd Research Network (ELKH), Budapest, Hungary.
  • Advances in Engineering Software. Volume 159, September 2021, 103029.

Abstract

Reference architectures for Big Data, machine learning and stream processing include not only recommended practices and interconnected building blocks but considerations for scalability, availability, manageability, and security as well. However, the automated deployment of multi-VM platforms on various clouds leveraging on such reference architectures may raise several issues. The paper focuses particularly on the widespread Apache Spark Big Data platform as the baseline and the Occopus cloud-agnostic orchestrator tool. The set of new generation reference architectures are configurable by human-readable descriptors according to available resources and cloud-providers, and offers various components such as Jupyter Notebook, RStudio, HDFS, and Kafka. These pre-configured reference architectures can be automatically deployed even by the data scientist on-demand, using a multi-cloud approach for a wide range of cloud systems like Amazon AWS, Microsoft Azure, OpenStack, OpenNebula, CloudSigma, etc. Occopus enables the scaling of cluster-oriented components (such as Spark) of the instantiated reference architectures. The presented solution was successfully used in the Hungarian Comparative Agendas Project (CAP) by the Institute for Political Science to classify newspaper articles.

Acknowledgments

The research was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program, and the ”NEANIAS: Novel EOSC services for Emerging Atmosphere, Underwater and Space Challenges” project under Grant Agreement No. 863448 (H2020-INFRAEOSC-2019-1).

We gratefully acknowledge the financial support of the Hungarian Scientific Research Fund (OTKA K 132838). The presented work of R. Lovas was also supported by the János Bolyai Research Scholarship of the Hungarian Academy of Sciences. On behalf of the Project Occopus, we thank for the usage of ELKH Cloud (https://science-cloud.hu) that significantly helped us achieve the results published in this paper, and Attila Csaba Marosi for his valuable contribution.

Keywords

Reference architectures Big data Artificial intelligence Machine learning Cloud computing Orchestration Distributed computing Stream processing Spark

Get the article at ScienceDirect.

 

EU Flag  NEANIAS is a Research and Innovation Action funded by European Union under Horizon 2020 research and innovation programme via grant agreement No.863448.