Logical directed acyclic graph
When a client submits a spark user-application code, the driver implicitly converts the code containing transformations into a logical directed acyclic graph (DAG)
Modern distributed computing problems
Modern distributed computing problems need a programming abstraction that is better than map-reduce, as an abstraction that wraps Map Reduce into a single higher-level concept.
Actions and different transformations spark
The action can be an operation that returns a value to the calling application or export data to the storage system
Examples of actions are count, collect, save, etc
These actions are different from transformations and are close to the reduce functionality of Map-Reduce
RDD can be persisted for future computation, that is, RDD can be kept in memory or saved on disk
This means RDDS can be reloaded and kept in memory.
Spark Use Cases in Finance Industry
Spark's generalized abstraction and growing helper libraries mean that companies can use Spark for a vast number of usages. Recommendations and other personalizations using big data are a vast use case, covering companies such as Yahoo, Comcast, Ooyala, Conviva, and Netflix.
Another case is data crunching for real-time threat analytics, such as different types of fraud detection.
Log aggregation and analysis being done at eBay.
Spark-Scala the machine learning algorithm
Scala is an open source programming language. It was created by Martin Odersky in 2001..
Another important event in Scala history was the creation of Typesafe Incorporation in May 2011 for providing commercial support to Scala.
Spark documentation and RDD features
The Spark documentation defines an RDD as a collection of elements partitioned across the nodes of the cluster that can be operated-on in parallel.