Skip to main content

The #1 Open Source Metadata Platform

DataHub is an extensible metadata platform that enables data discovery, data observability and federated governance to help tame the complexity of your data ecosystem.

Built with ❤️ by Acryl Data and LinkedIn.

Get Started →Join our SlackJoin November Townhall! ✨

Get Started Now

Run the following command to get started with DataHub.

python3 -m pip install --upgrade pip wheel setuptools 
python3 -m pip install --upgrade acryl-datahub
datahub docker quickstart
DataHub Quickstart GuideDeploying With Kubernetes

Metadata 360

Combine technical, operational and business metadata to provide a 360 degree view of your data entities.

Shift-left

Apply “shift-left” practices to pre-enrich important metadata using ingestion transformers, support for dbt meta-mapping and other features.

Active Metadata

Act on changes in metadata in real time by notifying key stakeholders, circuit-breaking business-critcal pipelines, propogating metadata across entites, and more.

Open Source

DataHub was originally built at LinkedIn and subsequently open-sourced under the Apache 2.0 License. It now has a thriving community with over a hundred contributors, and is widely used at many companies.

Forward Looking Architecture

DataHub follows a push-based architecture, which means it's built for continuously changing metadata. The modular design lets it scale with data growth at any organization, from a single database under your desk to multiple data centers spanning the globe.

Massive Ecosystem

DataHub has pre-built integrations with your favorite systems: Kafka, Airflow, MySQL, SQL Server, Postgres, LDAP, Snowflake, Hive, BigQuery, and many others. The community is continuously adding more integrations, so this list keeps getting longer and longer.

ADLSAirflowAthenaAzure ADBigQueryClickhouseCouchBaseDatabricksDBTDeltalakeDruidElasticsearchFeastGlueGreat ExpectationsHadoopHiveIcebergKafkaKustoLookerMariaDBMetabaseModeMongoDBMSSQLMySQLNiFiOktaOraclePinotPostgreSQLPowerBIPrestoProtobufPulsarRedashRedshiftS3SalesforceSageMakerSnowflakeSparkSQLAlchemySupersetTableauTeradataTrinoADLSAirflowAthenaAzure ADBigQueryClickhouseCouchBaseDatabricksDBTDeltalakeDruidElasticsearchFeastGlueGreat ExpectationsHadoopHiveIcebergKafkaKustoLookerMariaDBMetabaseModeMongoDBMSSQLMySQLNiFiOktaOraclePinotPostgreSQLPowerBIPrestoProtobufPulsarRedashRedshiftS3SalesforceSageMakerSnowflakeSparkSQLAlchemySupersetTableauTeradataTrino

A Modern Approach to Metadata Management

Automated Metadata Ingestion

Push-based ingestion can use a prebuilt emitter or can emit custom events using our framework.

Pull-based ingestion crawls a metadata source. We have prebuilt integrations with Kafka, MySQL, MS SQL, Postgres, LDAP, Snowflake, Hive, BigQuery, and more. Ingestion can be automated using our Airflow integration or another scheduler of choice.

Learn more about metadata ingestion with DataHub in the docs.

recipe.yml
source:
type: "mysql"
config:
username: "datahub"
password: "datahub"
host_port: "localhost:3306"
sink:
type: "datahub-rest"
config:
server: 'http://localhost:8080'
datahub ingest -c recipe.yml

Discover Trusted Data

Browse and search over a continuously updated catalog of datasets, dashboards, charts, ML models, and more.

Understand Data in Context

DataHub is the one-stop shop for documentation, schemas, ownership, lineage, pipelines, data quality, usage information, and more.

Trusted Across the Industry

LinkedInUdemyAirtelCourseraGeotabThoughtWorksExpedia GroupTypeformPelotonZyngaHurbRazerClassDojo
“[DataHub] has made our legal team very happy with being able to keep track of our sensitive data [to answer questions like] Where’s it going? How’s it being processed? Where’s it ending up? Which third party tool or API’s are we sending it to and why? Who is responsible for this integration?”
Wolt
“DataHub aligns with our needs [for] data documentation, a unified search experience, lineage information, and additional metadata. We are also very impressed with the vibrant and supportive community.”
Coursera
“DataHub allows us to solve the data discovery problem, which was a big challenge in our organization, and now we are solving it.”
Adevinta