Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

Is the gh archive available on google bigquery?


Asked by Marlowe Cisneros on Dec 04, 2021 FAQ



The entire GH Archive is also available as a public dataset on Google BigQuery: the dataset is automatically updated every hour and enables you to run arbitrary SQL-like queries over the entire dataset in seconds. To get started: If you don't already have a Google project... Execute your first query...
Furthermore,
Google BigQuery. A fast, highly scalable, cost-effective, and fully managed cloud data warehouse for analytics, with built-in machine learning. BigQuery is Google's serverless, highly scalable, enterprise data warehouse designed to make all your data analysts productive at an unmatched price-performance.
In addition, Other organizations have also made their data publicly available in BigQuery. For example, Github's GH Archive dataset can be used to analyze public events on GitHub, such as pull requests, repository stars, and opened issues. The Python Software Foundation's PyPI dataset can be used to analyze download requests for Python packages.
In respect to this,
GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis. GitHub provides 20+ event types, which range from new commits and fork events, to opening new tickets, commenting, and adding members to a project.
Accordingly,
You can upload data files from local sources, Google Drive , or Cloud Storage buckets, take advantage of BigQuery Data Transfer Service (DTS), Data Fusion plug-ins , or leverage Google's industry-leading data integration partnerships . You have ultimate flexibility in how you bring data into your data warehouse.