Skip to content
/ rumble Public
forked from RumbleDB/rumble

β›ˆοΈ RumbleDB 1.17.0 "Cacao tree" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

License

Unknown and 11 other licenses found

Licenses found

Unknown
LICENSE.txt
Unknown
LICENSE-ANTLR.txt
Unknown
LICENSE-Apache-Commons-IO.txt
Unknown
LICENSE-Apache-Commons-Lang.txt
Unknown
LICENSE-Apache-Commons-Text.txt
Apache-2.0
LICENSE-Apache-HttpClient.txt
Unknown
LICENSE-JLine.txt
Apache-2.0
LICENSE-Joda-time.txt
BSD-3-Clause
LICENSE-Kryo.md
BSD-3-Clause
LICENSE-Laurelin.txt
Apache-2.0
LICENSE-Spark.txt
Apache-2.0
LICENSE-gson.txt
Notifications You must be signed in to change notification settings

bonzani/rumble

This branch is 1155 commits behind RumbleDB/rumble:master.

Folders and files

NameName
Last commit message
Last commit date
Jan 27, 2022
Feb 4, 2022
Nov 25, 2021
Feb 14, 2022
Jan 27, 2022
Apr 1, 2021
Jan 22, 2020
Sep 18, 2017
Nov 1, 2021
Sep 11, 2019
Apr 16, 2019
Jul 6, 2020
Sep 18, 2017
Oct 23, 2019
Jun 4, 2019
Mar 23, 2020
Sep 18, 2017
Dec 15, 2020
Nov 1, 2021
Sep 18, 2017
Jul 5, 2021
Jul 7, 2021
Jun 22, 2020
Mar 29, 2021
Jan 11, 2022
Feb 11, 2020
Jan 25, 2022
Sep 4, 2020
Feb 11, 2020

RumbleDB

With RumbleDB, you can query with ease a lot of different nested, heterogeneous data formats like JSON, CSV, Parquet, Avro, LibSVM, text, etc.

RumbleDB exposes a query language rather than a DataFrame API, for more flexibility, more productivity but also because a lot of data simply will not fit in DataFrames.

You can query it in place from any local file systems or data lakes (Azure blob storage, Amazon S3, HDFS, etc).

You can prepare, clean up, validate your data and put it right into your machine learning pipelines with RumbleDB ML.

Getting started: you will find a Jupyter notebook that introduces the JSONiq language on top of RumbleDB here. You can also run it locally if you prefer.

The documentation also contains an introduction specific to RumbleDB and how you can read input datasets, but we have not converted it to Jupyter notebooks yet (this will follow).

The documentation of the latest official release is available here.

The documentation of the current master (for the adventurous and curious) is available here.

About

β›ˆοΈ RumbleDB 1.17.0 "Cacao tree" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Resources

License

Unknown and 11 other licenses found

Licenses found

Unknown
LICENSE.txt
Unknown
LICENSE-ANTLR.txt
Unknown
LICENSE-Apache-Commons-IO.txt
Unknown
LICENSE-Apache-Commons-Lang.txt
Unknown
LICENSE-Apache-Commons-Text.txt
Apache-2.0
LICENSE-Apache-HttpClient.txt
Unknown
LICENSE-JLine.txt
Apache-2.0
LICENSE-Joda-time.txt
BSD-3-Clause
LICENSE-Kryo.md
BSD-3-Clause
LICENSE-Laurelin.txt
Apache-2.0
LICENSE-Spark.txt
Apache-2.0
LICENSE-gson.txt

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 83.7%
  • JSONiq 7.1%
  • jq 4.6%
  • Jupyter Notebook 2.9%
  • ANTLR 1.6%
  • HTML 0.1%