Extensions
- madlib 1.7.4
README
Contents
MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data.
Installation and Contribution
See the project webpage MADlib Home
for links to the
latest binary and source packages. For installation and contribution guides,
please see MADlib Wiki
User and Developer Documentation
The latest documentation of MADlib modules can be found at MADlib
Docs
or can be accessed directly from the MADlib
installation directory by opening doc/user/html/index.html
.
Architecture
The following block-diagram gives a high-level overview of MADlib's architecture.
Third Party Components
MADlib incorporates material from the following third-party components
argparse 1.2.1
"provides an easy, declarative interface for creating command line tools"Boost 1.47.0 (or newer)
"provides peer-reviewed portable C++ source libraries"CERN ROOT
"is an object oriented framework for large scale data analysis"doxypy 0.4.2
"is an input filter for Doxygen"Eigen 3.2.2
"is a C++ template library for linear algebra"PyYAML 3.10
"is a YAML parser and emitter for Python"PyXB 1.2.4
"is a Python library for XML Schema Bindings"
Licensing
License information regarding MADlib and included third-party libraries can be
found inside the license
directory.
Release Notes
Changes between MADlib versions are described in the
ReleaseNotes.txt
file.
Papers and Talks
MAD Skills : New Analysis Practices for Big Data (VLDB 2009)
Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)
Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)
The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)