Apache Spark Assembly

A Deep-dive into Apache Spark!


Apache Spark is not only a paradigm of modern big data programming, but also an excellent example of a large-scale big data systems software. As a student studying Apache Spark, I wanted to know its implementation details such as design documents, which were very hard to find out. I thus started to investigate details by myself and write/organize them in my own words.

Progress & Goal

Starting from feburary of 2021, I am analyzing all the spark core component within code level. The focus of this book is set to interpret the implementation details, and to illustrate the detailed architecture of Apache Spark. After all, the ultimate goal of this writing is to fully comprehend the system in the original coder’s view and to develop skills for writing huge systems software.


Although the writing is in its first few steps, I’m constantly updating the book in the following public link:

Book : Apache Spark Assembly