Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Beam 2.23.0 was released today, updating big data batch and streaming standards


Jun 01, 2021 Article blog


Table of contents


Brief introduction

Apache Beam 2.23.0 is now available. Apache Beam is Google contribution to the Apache Foundation in February 2016, with the primary goal of unifying the programming paradigms of batch and streaming, providing a simple, flexible, feature-rich, and powerful SDK for infinite, disordered, web-scale data set processing. Apache Beam project focuses on the programming paradigm and interface definition of data processing and does not involve the implementation of a specific execution engine, and Apache Beam wants data processors developed based on Beam to be executed on any distributed computing engine.

 Apache Beam 2.23.0 was released today, updating big data batch and streaming standards1

Key updates:

Highlights

  • Twister2 Runner (BEAM-7304)。

  • Python 3.8 Support (BEAM-8494).

I/Os

  • Added support for Snowflake reading (Java) (BEAM-9722).

  • Added support for writing Splunk (Java) (BEAM-8596).

  • Added support for assume role (Java) (BEAM-10335).

  • A new transform has been added that can be read from BigQuery apache_beam.io.gcp.bigquery.ReadFromBigQuery T his transform is experimental. I t reads data from BigQuery by exporting it to Avro file and reading it. I t also supports reading data by exporting to a JSON file. Fields related to time and date have very little difference in behavior.

  • Add dispositions (BEAM-10343) for SnowflakeIO.write

New Features/Improvements

Update Snowflake JDB C dependency and add application=beam to the connection connection URL (BEAM-10383).

Breaking Changes

  • When deserializing JSON (Java), RowJson.RowJsonDeserializer JsonToRow and PubsubJsonTableProvider now accept implicit nulls by default. P revious nulls can only be represented by explicit null values, such as {"foo": "bar", "baz": null} and implicit null values such as {"foo": "bar"} which throw an exception. B oth JSON strings now produce the same result by default. You can override this behavior by RowJson.RowJsonDeserializer#withNullBehavior

  • Fix an error in the GroupIntoBatches experimental transformation in Python that actually grouped batches by keystrokes. This changes the output type of this conversion (BEAM-6696).

Deprecations

  • Remove Gearpump runner. (BEAM-9999)

  • Remove the Apex runner. (BEAM-9999)

  • RedisIO.readAll() has been deprecated and will be removed in 2 versions, and users must use RedisIO.readKeyPatterns() as an alternative (BEAM-9747).

Source: https://beam.apache.org/blog/beam-2.23.0/