#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 8,337 public repositories matching this topic...

apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.

bigquery real-time sql database spark hive hadoop etl snowflake olap query-engine redshift dbt elt iceberg hudi delta-lake lakehouse

Updated Jun 12, 2024
Java

HariSekhon / Knowledge-Base

IT Knowledge Base from 20 years in DevOps, Linux, Cloud, Big Data, AWS, GCP etc - gradually porting my large private knowledge base to public

Updated Jun 12, 2024
Shell

moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-science spark record-linkage entity-resolution fuzzy-matching deduplication em-algorithm data-matching deduplicate-data duckdb uk-gov-data-science

Updated Jun 12, 2024
Python

nessie

projectnessie / nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

git java data spark aws-lambda iceberg

Updated Jun 12, 2024
Java

apachecn / .github

ApacheCN 开源组织：公告、介绍、成员、活动、交流方式

python spark ml pytorch solidity dl

Updated Jun 12, 2024
CSS

J-sephB-lt-n / useful-code-snippets

A searchable collection of useful little pieces of code

python shell bash cloud spark ec2 graph virtual-machine gcp pyspark dataproc streamlit rustworkx

Updated Jun 12, 2024
Python

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Jun 12, 2024
Python

apache / spark

Apache Spark - A unified analytics engine for large-scale data processing

python java r scala sql big-data spark jdbc

Updated Jun 12, 2024
Scala

FistGang / PrimeSpark

Prime Number Generator using PySpark

spark pyspark prime-numbers sieve-of-eratosthenes

Updated Jun 12, 2024
Python

xuwenyihust / PawMark

PawMark is a platform for developers to build, schedule and monitor data pipelines.

kubernetes workflow spark jupyter-notebook gcp orchestration data-engineering data-platform mlflow delta-lake

Updated Jun 12, 2024
JavaScript

deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learni…

python java clojure scala spark hadoop gpu intellij linear-algebra artificial-intelligence deeplearning neural-nets dl4j matrix-library deeplearning4j

Updated Jun 12, 2024
Java

logicalclocks / feature-store-api

Python - Java/Scala API for the Hopsworks feature store

python scala spark feature-store hsfs hopsworks

Updated Jun 12, 2024
Python

crealytics / spark-excel

A Spark plugin for reading and writing Excel files

scala spark etl excel data-frame

Updated Jun 12, 2024
Scala

YeonwooSung / DevOpsMisc

Miscellaneous codes and writings for DevOps

nginx aws devops sql spark serverless gcp pyspark infra devops-pipeline devop

Updated Jun 12, 2024
Jupyter Notebook

YeonwooSung / MLOps

Miscellaneous codes and writings for MLOps

Updated Jun 12, 2024
Jupyter Notebook

ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.

sql big-data spark clickhouse distributed-database lakehouse olap-database ytsaurus

Updated Jun 12, 2024
C++

azuresphere7 / MicrosoftFabric-Exploratorium

A comprehensive educational resource hub dedicated to mastering Microsoft Fabric, offering in-depth tutorials, real-world use cases, and hands-on guides for seamless end-to-end analytics

data-science spark analytics data-transformation warehouse powerbi real-time-analytics lakehouse microsoft-fabric one-lake

Updated Jun 12, 2024
Shell

mongodb / mongo-spark

The MongoDB Spark Connector

spark mongodb connector spark-packages mongo-spark

Updated Jun 12, 2024
Java

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Jun 12, 2024
Python

marsfoundation / spark-app

spark ethereum dapp dai makerdao defi

Updated Jun 12, 2024
TypeScript

Created by Matei Zaharia

Released May 26, 2014

Followers: 417 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics