I. Introduction
Apache Spark has emerged as a fast growing and widely adopted framework for big-data analytics [1]. The capability to run batch processing, streaming, iterative and interactive jobs within a single framework [2] has made Apache Spark as natural choice for researchers, data scientists and industry. It is supported and maintained by a large community of open source contributors. Many companies not only use Apache Spark but also offer their own version of Apache Spark to their customers as-a-service over the cloud. However, when it comes to security and confidentiality of data, Apache Spark faces challenges that are already faced by cloud computing infrastructures and other existing big -data platforms such as Hadoop [3]. Researchers have proposed various solutions for data security and confidentiality at different levels in big -data frameworks such as Hadoop. However, these approaches are not directly applicable to Apache Spark due to architectural differences between Apache Spark and other frameworks. A review of existing security approaches is provided in Section VII.