May 30, 2021 Article blog
What are the hottest Python libraries in 2020? T he rules are simple. The library that the editor is looking for meets the following criteria:
Disclaimer: Our choices are heavily influenced by machine learning/data science libraries, although some of them are also useful to non-data scientists.
The spirit of this article is to make these libraries more known to the public, so it's not too late, let's get started.
You don't always need to write CLI applications, but doing so saves you a lot of money. After the great success of the FastAPI (https://fastapi.tiangolo.com/), tiangolo (https://twitter.com/tiangolo) used the same principles to bring us Typer: a new library that allows you to write command-line interfaces using Python 3.6 plus type prompts.
The design really sets Typer apart. I n addition to ensuring that your code is properly logged, you can easily verify the CLI interface. By using type hints, you can get automatic complement functionality in Python editors such as VS Code, which will increase your productivity.
To enhance its functionality, the Typer kernel is based on click (https://click.palletsprojects.com/en/7.x/), which is well known and rigorously tested. This means it can take advantage of all its benefits, such as communities and plug-ins, while starting with less sample code and becoming complex as needed.
Typer documentation (https://typer.tiangolo.com/) is really helpful and should serve as a model for other projects. Never miss it!
Following the CLI theme, who says the terminal application must be pure white, or if you are a real hacker, it must be green and black?
Do you want to add colors and styles to the terminal output? E ffortless display of beautiful progress bars? M arkdown？ E moji? R ich can do all of the above. Check out the sample screenshot below to learn more:
Rich is definitely a library that takes the experience of using terminal applications to the next level.
Although as we can see, the terminal application can be beautiful, but sometimes it's not enough and you need a real GUI. It was for this reason that Dear PyGui, the Python branch of the popular Dear ImGui C++ project (https://github.com/ocornut/imgui), was born.
Dear PyGui takes advantage of the instant mode paradigm, which is popular in video games. T his basically means that dynamic GUI is drawn frame by frame without retaining any data. T his makes the tool fundamentally different from other Python GUI frameworks. It has high performance and uses the computer's GPU to facilitate the construction of highly dynamic interfaces, which are often used in engineering, simulation, gaming, or data science applications.
Dear PyGui can be used without a steep learning curve and can be run on Windows 10 (DirectX 11), Linux (OpenGL 3) and MacOS (Metal).
Avenue to Jane, this is a library to think about: No one thought about what was going on before?
PrettyErrors does only one thing and does it well. I n a terminal that supports color output, it converts the hidden stack trajectory into something more suitable for analysis with a weak human eye. N o more scanning the entire screen to find the cause of the exception... You can see it at a glance now!
Our programmers like to solve problems with code. B ut sometimes we need to explain complex architectural designs to other colleagues. T raditionally, we use GUI tools where we can work with charts and visualizations to put them into presentations and documents. But that's not the only way.
Diagrams allows you to draw cloud system architectures directly in Python code without any design tools. I t contains icons that support multiple cloud providers (including AWS, Azure, GCP). T his makes it easy to create arrows and groups. Really, there are only a few lines of code!
What is the best thing about code-based charts? You can use version control with git to control your progress!
There are always countless settings to try when conducting research and experiments on machine learning projects. I n non-trivial solution applications, configuration management can become quite complex and very fast. Wouldn't it be nice to have a structured approach to this complexity?
Hydra is a tool that lets you build configurations in a composable manner and overwrite certain parts from the command line or profile.
To illustrate some of the common tasks that can be simplified by the library, suppose you have the basic architecture of the model we are trying, as well as its many variations. With Hydra, you can define the basic configuration, run multiple jobs, and make the following changes:
- python train_model.py variation=option_a,option_b
- ├── variation
- │ ├── option_a.yaml
- │ └── option_b.yaml
- ├── base.yaml
- └── train_model.py
Hydra's cousin OmegaConf provides a consistent API for the foundation of the hierarchical configuration system and supports different sources such as YAML, configuration files, objects, and CLI parameters.
This is a prerequisite for configuration management in the 21st century!
Every tool to increase the productivity of your data science team deserves encouragement. There's no reason for people working on data science projects to reinvent the wheel every time, think twice about how to better organize the code in their projects, use poorly maintained "PyTorch models," or use higher levels of abstraction.
PyTorch Lightning helps increase productivity by separating science from engineering. I n a sense, it makes your code cleaner, a bit like TensorFlow's Keras. But it's still PyTorch, and you can access all the commonly used APIs.
The library helps teams build easily scalable, high-quality code across multiple GPUs, TUS, and CPUs, taking advantage of good practices and clear component responsibilities around the organization's software engineering.
A library that helps junior members of a data science team produce better results, but more experienced members will love it because of increased overall productivity and the absence of relinquishing control.
Not all machine learning is deep learning. Typically, your model consists of more traditional algorithms implemented in scikit-learn, such as Random Forest, or you use gradient enhancement methods such as popular LightGBM and XGBoost.
However, much progress is taking place in the field of deep learning. F rameworks like PyTorch are growing at an alarming rate, and hardware devices are optimized to run trace calculations faster and reduce power consumption. Wouldn't it be nice if we could use all this work to run traditional methods faster and more efficiently?
This is where Hummingbird comes in. T he new library, provided by Microsoft, compiles trained traditional ML models into tension calculations. This is great because it eliminates the need to redesign the model.
So far, Hummingbird has supported conversion to PyTorch, TorchScript, ONNX and TVM, as well as a variety of ML models and vectors. T he inference API is also very similar to the Sklearn example, allowing you to reuse existing code, but change the implementation to the code generated by Hummingbird. This is a noteworthy tool because it gets support for pattern models and formats!
Almost every data scientist has processed high-dimensional data at some point in their careers. Unfortunately, the human brain doesn't have enough wires to process this data intuitively, so we have to resort to other techniques.
Earlier this year, Facebook released HiPlot, a library that uses parallel drawings and other graphical means to represent information to help discover correlations and patterns in high-dimensional data. The concept has been explained in its blog post, but basically it is a good way to visualize and filter high-dimensional data.
HiPlot is interactive and scalable, and you can use it from a standard Jupyter laptop or through its own server.
As the Python library ecosystem becomes more complex, we find ourselves writing more and more code that relies on C extensions and multithreadeds. This becomes a problem when measuring performance because CPython's built-in profiler does not handle multithreaded and native code correctly.
That's when Scalene went to the rescue. S calene is a CPU and memory profiler for Python scripts that handles multithreaded code correctly and distinguishes the time it takes to run Python and native code. You don't need to modify the code, you just need to run the script with scalene on the command line, and the script generates a text or HTML report for you that shows CPU and memory usage for each line of code.