Table of Contents
Introduction
All github contents recently got query-able by the Google BigQuery. See the announcement from github. You can execute the BigQuery queries at the BigQuery console.
I used it to find top angular directives used on github. In first two sections I only list “ng-” directives. Even if parsing html with regexps is faulty, it is correct enough for frequency analysis.
Third section, Custom directives, uses a heuristic to find custom angular directives. You can use it to find popular directives you have not been aware of.
Revision history of this post is stored on github.
You may also look at my other posts analyzing github using BigQuery.
Unique usage per repository in top repositories with example usage
Counting unique usage is expensive and I am out of my free quota for this month. This query runs on a smaller table that only contains 10% random sample of files from top 130k top popular repositories.
It will calculate “how many unique repos use this directive” and add a link to an example usage.
SELECT REGEXP_EXTRACT(line, r".*[ <]+(ng-[a-zA-Z0-9-]+).*") AS line, COUNT(DISTINCT(sample_repo_name)) AS count_distinct_repos, CONCAT("https://github.com/", FIRST(sample_repo_name), "/blob/", REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"), "/", FIRST(sample_path)) AS example_url, FROM ( SELECT SPLIT(content, '\n') line, sample_repo_name, sample_path, sample_ref FROM [bigquery-public-data:github_repos.sample_contents] WHERE (sample_path LIKE '%.html' OR sample_path LIKE '%.ng') HAVING line CONTAINS 'ng-') GROUP BY 1 ORDER BY count_distinct_repos DESC LIMIT 500;
Full results on google docs. Top 20 results:
Top usages in all repositories
Previous query was looking only at the sample of data. This query looks through all files accessible on github.
SELECT TOP(line, 500), COUNT(*) AS c FROM ( SELECT REGEXP_EXTRACT((SPLIT(contents.content, '\n')), r".*[^a-zA-Z](ng-[a-zA-Z0-9-]+).*") line, contents.id AS id FROM [bigquery-public-data:github_repos.contents] AS contents JOIN ( SELECT path, id FROM [bigquery-public-data:github_repos.files] WHERE path LIKE '%.ng' OR path LIKE '%.html') AS files ON (contents.id == files.id) HAVING line CONTAINS "ng-");
Full results on google docs. Top 20 results:
directive | count |
---|---|
ng-click | 1572920 |
ng-model | 1355222 |
ng-show | 962245 |
ng-repeat | 697010 |
ng-if | 601903 |
ng-controller | 591669 |
ng-app | 460875 |
ng-class | 452863 |
ng-bind | 283218 |
ng-hide | 217121 |
ng-disabled | 168468 |
ng-include | 125913 |
ng-init | 125508 |
ng-submit | 118507 |
ng-switch-when | 111254 |
ng-href | 109513 |
ng-src | 108365 |
ng-template | 108241 |
ng-change | 101197 |
ng-bind-html | 89604 |
Custom directives
Methodology
I tried a heuristic for finding custom directives – extract all html tags and look at relative frequency in all html files vs “probably angular html” files.
“Probably angular html” is based on the assumption that “ng-” is ubiquitous in angular html, but not that frequent otherwise. Also some angular files use the .ng extension. This method is going to obviously have some false positives and negatives. Looking through results, 2.0 ratio was optimal. Here you can see top 50 results that were right past the edge of exclusion – ratio was between 2.0 and 2.5. Very few entries are legitimate.
I am again using the sampled sample_contents, since I ran out of free quota.
BigQuery query
SELECT tag, COUNT(1) / SUM(IF(probably_angular, 1, 0)) AS html_to_angular_ratio, COUNT(DISTINCT(sample_repo_name)) AS distinct_repository_count, CONCAT("https://github.com/", FIRST(sample_repo_name), "/blob/", REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"), "/", FIRST(sample_path)) AS example_url FROM ( SELECT SPLIT(REGEXP_REPLACE( REGEXP_REPLACE(content, r"['\"\\\/\$]+[a-zA-Z-]*", ""), r"[^a-zA-Z-]+", " "), " ") AS tag, (REGEXP_MATCH(content, r".*[ <]+ng-[a-zA-Z0-9-]+.*") OR (sample_path LIKE '%.ng')) AS probably_angular, sample_repo_name, sample_path, sample_ref FROM [bigquery-public-data:github_repos.sample_contents] WHERE (sample_path LIKE '%.html' OR sample_path LIKE '%.ng')) GROUP BY 1 HAVING html_to_angular_ratio < 2.0 ORDER BY distinct_repository_count DESC LIMIT 1000;
Results
All results in google docs, including ng- entries.
For example, it found directives from ionic framework, bootstrap or ng-file-upload. Top 20 results excluding the entries that start with ng- or *angular*
:
My other posts analyzing github using BigQuery
You may also take a look at my other posts: