Jun 01, 2021 Article blog
Apache Beam 2.23.0
is now available.
Apache Beam
is
Google
contribution to the
Apache
Foundation in February 2016, with the primary goal of unifying the programming paradigms of batch and streaming, providing a simple, flexible, feature-rich, and powerful
SDK
for infinite, disordered,
web-scale
data set processing.
Apache Beam
project focuses on the programming paradigm and interface definition of data processing and does not involve the implementation of a specific execution engine, and
Apache Beam
wants data processors developed based on
Beam
to be executed on any distributed computing engine.
Twister2 Runner
(BEAM-7304)。
Python 3.8
Support (BEAM-8494).
Snowflake reading
(Java) (BEAM-9722).
Splunk
(Java) (BEAM-8596).
assume role
(Java) (BEAM-10335).
BigQuery
apache_beam.io.gcp.bigquery.ReadFromBigQuery
T
his
transform
is experimental. I
t reads data from
BigQuery
by exporting it to
Avro
file and reading it. I
t also supports reading data by exporting to a
JSON
file.
Fields related to time and date have very little difference in behavior.
dispositions
(BEAM-10343) for
SnowflakeIO.write
Update
Snowflake JDB
C dependency and add
application=beam
to the connection
connection URL
(BEAM-10383).
RowJson.RowJsonDeserializer
JsonToRow
and
PubsubJsonTableProvider
now accept
implicit nulls
by default. P
revious nulls can only be represented by explicit null values, such as
{"foo": "bar", "baz": null}
and
implicit null
values such as
{"foo": "bar"}
which throw an exception. B
oth JSON strings now produce the same result by default.
You can override this behavior by
RowJson.RowJsonDeserializer#withNullBehavior
GroupIntoBatches
experimental transformation in Python that actually grouped batches by keystrokes.
This changes the output type of this conversion (BEAM-6696).