Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 8,337 public repositories matching this topic...
IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public
-
Updated
Jun 12, 2024 - Shell
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
Jun 12, 2024 - Python
Prime Number Generator using PySpark
-
Updated
Jun 12, 2024 - Python
PawMark is a platform for developers to build, schedule and monitor data pipelines.
-
Updated
Jun 12, 2024 - JavaScript
Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…
-
Updated
Jun 12, 2024 - Java
A Spark plugin for reading and writing Excel files
-
Updated
Jun 12, 2024 - Scala
Miscellaneous codes and writings for MLOps
-
Updated
Jun 12, 2024 - Jupyter Notebook
YTsaurus is a scalable and fault-tolerant open-source big data platform.
-
Updated
Jun 12, 2024 - C++
A comprehensive educational resource hub dedicated to mastering Microsoft Fabric, offering in-depth tutorials, real-world use cases, and hands-on guides for seamless end-to-end analytics
-
Updated
Jun 12, 2024 - Shell
The MongoDB Spark Connector
-
Updated
Jun 12, 2024 - Java
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Jun 12, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 417 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia