Top angular directives on github, including custom directives

Introduction

All github contents recently got query-able by the Google BigQuery. See the announcement from github. You can execute the BigQuery queries at the BigQuery console.

I used it to find top angular directives used on github. In first two sections I only list “ng-” directives. Even if parsing html with regexps is faulty, it is correct enough for frequency analysis.

Third section, Custom directives, uses a heuristic to find custom angular directives. You can use it to find popular directives you have not been aware of.

Revision history of this post is stored on github.

You may also look at my other posts analyzing github using BigQuery.

Unique usage per repository in top repositories with example usage

Counting unique usage is expensive and I am out of my free quota for this month. This query runs on a smaller table that only contains 10% random sample of files from top 130k top popular repositories.

It will calculate “how many unique repos use this directive” and add a link to an example usage.

SELECT
  REGEXP_EXTRACT(line, r".*[ <]+(ng-[a-zA-Z0-9-]+).*") AS line,
  COUNT(DISTINCT(sample_repo_name)) AS count_distinct_repos,
  CONCAT("https://github.com/",
      FIRST(sample_repo_name),
      "/blob/",
      REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"),
      "/",
      FIRST(sample_path)) AS example_url,
FROM (
  SELECT
    SPLIT(content, '\n') line,
    sample_repo_name,
    sample_path,
    sample_ref
  FROM
    [bigquery-public-data:github_repos.sample_contents]
  WHERE
    (sample_path LIKE '%.html'
      OR sample_path LIKE '%.ng')
  HAVING
    line CONTAINS 'ng-')
GROUP BY
  1
ORDER BY
  count_distinct_repos DESC
LIMIT
  500;

Full results on google docs. Top 20 results:

line count_distinct_repos example_url
ng-click 1040
ng-repeat 827
ng-model 823
ng-show 662 https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-class 560 https://github.com/streamdataio/streamdataio-js/blob/master/stockmarket-angular/index.html
ng-controller 559
ng-if 536 https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-app 421 https://github.com/streamdataio/streamdataio-js/blob/master/stockmarket-angular/index.html
ng-hide 290 https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-disabled 287 https://github.com/JekyllWriter/JekyllWriter/blob/master/layout/proxy.html
ng-submit 222 https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-src 211 https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-include 205 https://github.com/pixelpark/ppnet/blob/master/app/views/map.html
ng-change 198 https://github.com/mmautner/github-email-thief/blob/master/app/views/search_codes.html
ng-href 187 https://github.com/asm-products/octobox/blob/master/public/views/content/file/modal.html
ng-bind 173 https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-bind-html 150 https://github.com/LeoLombardi/tos-laimas-compass/blob/master/tos-laimas-compass-win32-x64/resources/app/node_modules/ui-select/docs/examples/demo-select2-with-bootstrap.html
ng-init 146 https://github.com/asm-products/octobox/blob/master/public/views/content/file/modal.html
ng-view 145 https://github.com/TheWildHorse/MovieNight/blob/master/public/index.html
ng-options 142 https://github.com/mmautner/github-email-thief/blob/master/app/views/search_codes.html

Top usages in all repositories

Previous query was looking only at the sample of data. This query looks through all files accessible on github.

SELECT
  TOP(line, 500),
  COUNT(*) AS c
FROM (
  SELECT
    REGEXP_EXTRACT((SPLIT(contents.content, '\n')),
          r".*[^a-zA-Z](ng-[a-zA-Z0-9-]+).*") line,
    contents.id AS id
  FROM
    [bigquery-public-data:github_repos.contents] AS contents
  JOIN (
    SELECT
      path,
      id
    FROM
      [bigquery-public-data:github_repos.files]
    WHERE
      path LIKE '%.ng'
      OR path LIKE '%.html') AS files
  ON
    (contents.id == files.id)
  HAVING
    line CONTAINS "ng-");

Full results on google docs. Top 20 results:

directive count
ng-click 1572920
ng-model 1355222
ng-show 962245
ng-repeat 697010
ng-if 601903
ng-controller 591669
ng-app 460875
ng-class 452863
ng-bind 283218
ng-hide 217121
ng-disabled 168468
ng-include 125913
ng-init 125508
ng-submit 118507
ng-switch-when 111254
ng-href 109513
ng-src 108365
ng-template 108241
ng-change 101197
ng-bind-html 89604

Custom directives

Methodology

I tried a heuristic for finding custom directives – extract all html tags and look at relative frequency in all html files vs “probably angular html” files.

“Probably angular html” is based on the assumption that “ng-” is ubiquitous in angular html, but not that frequent otherwise. Also some angular files use the .ng extension. This method is going to obviously have some false positives and negatives. Looking through results, 2.0 ratio was optimal. Here you can see top 50 results that were right past the edge of exclusion – ratio was between 2.0 and 2.5. Very few entries are legitimate.

I am again using the sampled sample_contents, since I ran out of free quota.

BigQuery query

SELECT
  tag,
  COUNT(1) / SUM(IF(probably_angular, 1, 0)) AS html_to_angular_ratio,
  COUNT(DISTINCT(sample_repo_name)) AS distinct_repository_count,
  CONCAT("https://github.com/",
      FIRST(sample_repo_name),
      "/blob/",
      REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"),
      "/",
      FIRST(sample_path)) AS example_url
FROM (
  SELECT
    SPLIT(REGEXP_REPLACE(
        REGEXP_REPLACE(content, r"['\"\\\/\$]+[a-zA-Z-]*", ""), 
        r"[^a-zA-Z-]+", " "), " ") AS tag,
    (REGEXP_MATCH(content,
         r".*[ <]+ng-[a-zA-Z0-9-]+.*")
     OR (sample_path LIKE '%.ng')) AS probably_angular,
    sample_repo_name,
    sample_path,
    sample_ref
  FROM
    [bigquery-public-data:github_repos.sample_contents]
  WHERE
    (sample_path LIKE '%.html'
      OR sample_path LIKE '%.ng'))
GROUP BY
  1
HAVING
  html_to_angular_ratio < 2.0
ORDER BY
  distinct_repository_count DESC
LIMIT 1000;

Results

All results in google docs, including ng- entries.

For example, it found directives from ionic framework, bootstrap or ng-file-upload. Top 20 results excluding the entries that start with ng- or *angular*:

tag html_to_angular_ratio distinct_repository_count example_url
translate 1.9265236377444466 729
novalidate 1.625 251 https://github.com/bonitasoft/bonita-ui-designer/blob/master/frontend/app/js/assets/asset-popup.html
ui-sref 1.1605095541401274 195 https://github.com/tatool/tatool-web/blob/master/app/views/doc/dev-executable-additional.html
orderBy 1.345821325648415 194
ui-view 1.2525252525252526 146 https://github.com/cityofasheville/simplicity-ui/blob/master/app/index.html
ion-content 1.4666666666666666 126 https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html
endbuild 1.6071428571428572 105 https://github.com/kwk/docker-registry-frontend/blob/v2/app/index.html
md-button 1.1246056782334384 98 https://github.com/deltaepsilon/quiver-cms/blob/master/app/views/address-dialog.html
ion-view 1.167785234899329 92 https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html
glyphicon-edit 1.8832684824902723 85
data-angularjs 1.736842105263158 76
md-content 1.1631578947368422 73 https://github.com/nozelrosario/Dcare/blob/master/www/views/vitals/trend.html
limitTo 1.1935483870967742 71 https://github.com/kwk/docker-registry-frontend/blob/v2/app/index.html
layout-align 1.065155807365439 70 https://github.com/nozelrosario/Dcare/blob/master/www/views/vitals/trend.html
ngIf 1.9211469534050178 69
md-icon 1.2275862068965517 61 https://github.com/radioit/radioit-desktop/blob/master/app/static/view/bangumi.detail.html
md-toolbar 1.1401869158878504 60
ion-list 1.7866666666666666 60 https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html

6 Comments

  1. nice!

    some comments:

    “big query” -> “bigquery” (one word please! otherwise I won’t be able to find your contributions)

    “TOP(line, 300)” -> fast, approximate results, cool

    “FROM [bigquery-public-data:github_repos.contents]” -> so you are going for the full 1.7TB archive? This query gets you out of the free tier. That’s ok, if you or someone else is willing to pay for it :).

    “WHERE path LIKE ‘%.ng’ OR path LIKE ‘%.html’) AS files” -> it might make sense to first extract this data to a new table. Then every new query can go over this table instead, and all querying will be cheaper thereafter. It’s a good investment 🙂

    Thanks, added.

    1. Adding my comment:
      > “big query” -> “bigquery”
      Fixed.

      > At this size and type of analysis we are really pushing the boundaries of BigQuery.
      Github data set size, 3TB, is tiny comparing to some data sets Dremel is used at. I thought that BigQuery uses Dremel under the hood?
      Anyway, if you would end up overwhelming one Dremel worker (what was happening in my case), you would start hitting Dremel OOMs at much smaller data set size than 3TB. “Find out if data under some join key is too big to fit into single workers memory” may be not obvious – I couldn’t find any external docs that would suggest it.

      > That’s ok, if you or someone else is willing to pay for it:).
      I am using 300 USD free trial.