May 17, 2021 09:00 Storm getting Started
Storm is a distributed, reliable, fault-020s data flow processing system. I t delegates work tasks to different types of components, each of which is responsible for a simple, specific task. T he input flow of the Storm cluster is managed by a component called spout, which passes data to bolt, which either saves the data to some kind of memory or passes the data to another bolt. As you can imagine, a Storm cluster is a series of bolts that convert spout data.
Here is a simple example of this concept. L ast night I saw the host on the news program talking about politicians and their positions on various political topics. They kept repeating different names, and I began to think about whether they were mentioned the same number of times and the deviation between different times.
Imagine an announcer reading subtitles as your data stream. Y ou can read a file with a spout (or socket, via HTTP, or something else). T he line of text is passed by spout to a bolt, which is then cut by word. T he word stream is passed on to another bolt, where each word is compared to a list of political names. F or each matching name encountered, the second bolt adds 1 to the count of the name in the database. Y ou can query the database at any time to see the results, and these counts are updated in real time as the data arrives. Refer to Topology Figure 1-1 for all components (spouts and bolts) and their relationships
Now imagine that it's easy to define the parallelity levels of each bolt and spout across the Storm cluster, so you can extend your topology indefinitely. I t's amazing, isn't it? Although this is a simple example, you can also see the power of Storm.
What are some typical Storm applications?
The data processing stream
As the example above shows, unlike other streaming systems, Storm does not require intermediate queues.
Continuously sends data to clients, enabling them to update and display results in real time, such as site metrics.
Distributed remote procedure calls
Frequent CPU-intensive operations are parallelized.
For a Storm cluster, a continuously running primary node organizes several nodes to work.
In a Storm cluster, there are two types of nodes: the master node master node and the worker nodes. T he primary node runs a daemon called Nimbus. T his daemon is responsible for distributing code in the cluster, assigning tasks to work nodes, and monitoring failures. T he Supervisor daemon runs on the working node as part of the topology. A Storm topology runs many working nodes on different machines.
Because Storm maintains all cluster state on Zookeyer or local disks, the daemon can be stateless and fail or restart without affecting the health of the entire system (see Figure 1-2)
At the bottom of the system, Storm uses zeromq (0mq, zeromq http://www.zeromq.org). T his is an advanced, embedded network communication library that provides great functionality to make Storm possible. Some of the characteristics of zeromq are listed below.
NOTE: Storm uses push/pull sockets only
Of all these design ideas and decisions, there are some great features that make Storm unique.