Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

August 9, 2023

9 views

1 min read

Cinema Mode

Tune into DoorDash’s journey to migrate from a flaky ETL system with 24-hour data delays, to standardizing a CDC streaming pattern across more than 150 databases to produce near real-time data in a scalable, configurable, and reliable manner.

During this journey, understand how we use Delta Lake to build a self-serve, read-optimized data lake with data latencies of 15, whilst reducing operational overhead. Furthermore, understand how certain tradeoffs like conceding to a non-real-time system allow for multiple optimizations but still permit for OLTP query use-cases, and the benefits it provides.

Talk by: Ivan Peng and Phani Nalluri

Here’s more to explore:
Big Book of Data Engineering: 2nd Edition: https://dbricks.co/3XpPgNV
The Data Team’s Guide to the Databricks Lakehouse Platform: https://dbricks.co/46nuDpI

Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc

Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

Add comment

Cancel reply

Categories

All Topics

210,000 CODERS lost jobs as NVIDIA released NEW coding language.

Kurzweil: AI will be smarter than all humans combined by 2029

The AI Revolution: Will Robots Take Your Job?

Artificial Intelligence | 60 Minutes Full Episodes

The A.I. Dilemma – March 9, 2023

In the Age of AI (full documentary) | FRONTLINE

Unlocking Near Real Time Data Replication with CDC, Apache Spark™ Streaming, and Delta Lake

You may also like

Add comment

Categories

All Topics