Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Apache Storm core concept


May 26, 2021 Apache Storm


Table of contents


Apache Storm reads the original stream of real-time data from one end and passes it through a series of small processing units, outputting processing/useful information at the other end.

The following diagram describes the core concepts of Apache Storm.

Apache Storm core concept

Now let's Take a closer look at the components of Apache Storm -

Component Describe
Tuple Tuple is the primary data structure in Storm. I t is a list of ordered elements. B y default, Tuple supports all data types. Typically, it is modeled as a set of comma-separated values and passed to the Storm cluster.
Stream A stream is an out-of-order sequence of meta-groups.
Spouts The source of the stream. T ypically, Storm accepts input from raw data sources such as the Twitter Streaming API, Apache Kafka Queue, Kestrel Queue, and so on. O therwise, you can write spouts to read data from the data source. tc.
Bolts Bolts are logical processing units. outs passes data to the Bolts and Bolts processes and generates a new output stream. b24> Bolts can perform filtering, aggregation, joining, and interacting with data sources and databases. 30> Bolts receive data and emit it to one or more Bolts. " IBolt" is the core interface for implementing Bolts. tc.

Let's take a look at a real-time example of "Twitter Analytics" and see how to model in Apache Storm. The following image describes the structure.

Apache Storm core concept

The input to "Twitter Analytics" comes from the Twitter Streaming API. S pout will use the Twitter Streaming API to read the user's tweets and output them as a group stream. b20> he steam from the metaset is then forwarded to Bolt, who splits the tweet into single words, calculates the number of words, and saves the information to the configured data source. ce.

Topology

Spouts and Bolts are connected together to form a topology. R eal-time application logic is specified in the Storm topology. Simply put, the topology is a graph in which the vertes are calculated and the edges are the data flow.

The simple topology starts with spouts. 20> Spouts emits data to one or more Bolts. Bolt represents a node in the topology with minimal processing logic, and the output of Bolts can be emitted to another Bolts as input.

Storm keeps the topology running until you terminate it. Apache Storm's primary job is to run the topology and run any number of topologys at a given time.

Task

Now you have one about Spouts with Bolts Basic idea.They are the minimum logical unit of topologies, and use a single Spout with Bolt Array build topology. They should be performed correctly in a specific order so that the topology is successfully run.STORM performed Spout with Bolt "Task". Simply, the task is Spouts or Bolts Execution.At a given time, each Spout with Bolt Multiple instances running in multiple separate threads can be operated.

Process

The topology runs in a distributed manner on multiple working nodes. S torm distributes tasks evenly across all working nodes. The role of the work node is to listen for jobs and start or stop processes when new jobs arrive.

Stream grouping

Data stream Spouts Flow Bolts Or from one Bolts Flow to another Bolts The flow group controls the routing method in the topology and helps us understand the metadata in the topology.There are four built-in groups as described below.

Random grouping

In the random packet, the equal number of tuples are randomly distributed in the execution. Bolts All workers.The following figure depicts the structure.

Apache Storm core concept

Fields are grouped

The fields with the same value are combined, and the remaining tuples are stored outside. Then, the tuple with the same field value is sent forward to the execution Bolts The same process.For example, if the stream is grouped by a field "word", a tuple having the same string "Hello" will move to the same worker. The figure below shows the working principle of field packets.

Apache Storm core concept

Global grouping

All streams can group and forward Bolts .This packet sends a tuple generated by all instances of the source to a single target instance (specifically, select a working program with the lowest ID).

Apache Storm core concept

All groups

All packets send a single copy of each tuple to receive Bolts All instances.This group is used for Bolts Send signal.All packets are useful for connection operations.

Apache Storm core concept