May 26, 2021 Apache Storm
Apache Storm reads the original stream of real-time data from one end and passes it through a series of small processing units, outputting processing/useful information at the other end.
The following diagram describes the core concepts of Apache Storm.
Now let's
Take a closer look at the components of Apache Storm -
Component | Describe |
---|---|
Tuple | Tuple is the primary data structure in Storm. I t is a list of ordered elements. B y default, Tuple supports all data types. Typically, it is modeled as a set of comma-separated values and passed to the Storm cluster. |
Stream | A stream is an out-of-order sequence of meta-groups. |
Spouts | The source of the stream. T ypically, Storm accepts input from raw data sources such as the Twitter Streaming API, Apache Kafka Queue, Kestrel Queue, and so on. O therwise, you can write spouts to read data from the data source. tc. |
Bolts | Bolts are logical processing units. outs passes data to the Bolts and Bolts processes and generates a new output stream. b24> Bolts can perform filtering, aggregation, joining, and interacting with data sources and databases. 30> Bolts receive data and emit it to one or more Bolts. " IBolt" is the core interface for implementing Bolts. tc. |
Let's take a look at a real-time example of "Twitter Analytics" and see how to model in Apache Storm. The following image describes the structure.
The input to "Twitter Analytics" comes from the Twitter Streaming API. S pout will use the Twitter Streaming API to read the user's tweets and output them as a group stream. b20> he steam from the metaset is then forwarded to Bolt, who splits the tweet into single words, calculates the number of words, and saves the information to the configured data source. ce.
Spouts and Bolts are connected together to form a topology. R eal-time application logic is specified in the Storm topology. Simply put, the topology is a graph in which the vertes are calculated and the edges are the data flow.
The simple topology starts with spouts. 20> Spouts emits data to one or more Bolts. Bolt represents a node in the topology with minimal processing logic, and the output of Bolts can be emitted to another Bolts as input.
Storm keeps the topology running until you terminate it. Apache Storm's primary job is to run the topology and run any number of topologys at a given time.
Now you have one about Spouts with Bolts Basic idea.They are the minimum logical unit of topologies, and use a single Spout with Bolt Array build topology. They should be performed correctly in a specific order so that the topology is successfully run.STORM performed Spout with Bolt "Task". Simply, the task is Spouts or Bolts Execution.At a given time, each Spout with Bolt Multiple instances running in multiple separate threads can be operated.
The topology runs in a distributed manner on multiple working nodes. S torm distributes tasks evenly across all working nodes. The role of the work node is to listen for jobs and start or stop processes when new jobs arrive.
Data stream Spouts Flow Bolts Or from one Bolts Flow to another Bolts 。 The flow group controls the routing method in the topology and helps us understand the metadata in the topology.There are four built-in groups as described below.
In the random packet, the equal number of tuples are randomly distributed in the execution. Bolts All workers.The following figure depicts the structure.
The fields with the same value are combined, and the remaining tuples are stored outside. Then, the tuple with the same field value is sent forward to the execution Bolts The same process.For example, if the stream is grouped by a field "word", a tuple having the same string "Hello" will move to the same worker. The figure below shows the working principle of field packets.
All streams can group and forward Bolts .This packet sends a tuple generated by all instances of the source to a single target instance (specifically, select a working program with the lowest ID).
All packets send a single copy of each tuple to receive Bolts All instances.This group is used for Bolts Send signal.All packets are useful for connection operations.