Top angular directives on github, including custom directives

Introduction
Unique usage per repository in top repositories with example usage
Top usages in all repositories
Custom directives
My other posts analyzing github using BigQuery

Introduction

All github contents recently got query-able by the Google BigQuery. See the announcement from github. You can execute the BigQuery queries at the BigQuery console.

I used it to find top angular directives used on github. In first two sections I only list “ng-” directives. Even if parsing html with regexps is faulty, it is correct enough for frequency analysis.

Third section, Custom directives, uses a heuristic to find custom angular directives. You can use it to find popular directives you have not been aware of.

Revision history of this post is stored on github.

You may also look at my other posts analyzing github using BigQuery.

Unique usage per repository in top repositories with example usage

Counting unique usage is expensive and I am out of my free quota for this month. This query runs on a smaller table that only contains 10% random sample of files from top 130k top popular repositories.

It will calculate “how many unique repos use this directive” and add a link to an example usage.

SELECT
  REGEXP_EXTRACT(line, r".*[ <]+(ng-[a-zA-Z0-9-]+).*") AS line,
  COUNT(DISTINCT(sample_repo_name)) AS count_distinct_repos,
  CONCAT("https://github.com/",
      FIRST(sample_repo_name),
      "/blob/",
      REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"),
      "/",
      FIRST(sample_path)) AS example_url,
FROM (
  SELECT
    SPLIT(content, '\n') line,
    sample_repo_name,
    sample_path,
    sample_ref
  FROM
    [bigquery-public-data:github_repos.sample_contents]
  WHERE
    (sample_path LIKE '%.html'
      OR sample_path LIKE '%.ng')
  HAVING
    line CONTAINS 'ng-')
GROUP BY
  1
ORDER BY
  count_distinct_repos DESC
LIMIT
  500;

Full results on google docs. Top 20 results:

line	count_distinct_repos	example_url
ng-click	1040
ng-repeat	827
ng-model	823
ng-show	662	https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-class	560	https://github.com/streamdataio/streamdataio-js/blob/master/stockmarket-angular/index.html
ng-controller	559
ng-if	536	https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-app	421	https://github.com/streamdataio/streamdataio-js/blob/master/stockmarket-angular/index.html
ng-hide	290	https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-disabled	287	https://github.com/JekyllWriter/JekyllWriter/blob/master/layout/proxy.html
ng-submit	222	https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-src	211	https://github.com/Groupmates-co/groupmates/blob/master/app/assets/javascripts/groupmates/mates/mates-tpl.html
ng-include	205	https://github.com/pixelpark/ppnet/blob/master/app/views/map.html
ng-change	198	https://github.com/mmautner/github-email-thief/blob/master/app/views/search_codes.html
ng-href	187	https://github.com/asm-products/octobox/blob/master/public/views/content/file/modal.html
ng-bind	173	https://github.com/BoxUpp/boxupp/blob/master/page/templates/vmConfigurations.html
ng-bind-html	150	https://github.com/LeoLombardi/tos-laimas-compass/blob/master/tos-laimas-compass-win32-x64/resources/app/node_modules/ui-select/docs/examples/demo-select2-with-bootstrap.html
ng-init	146	https://github.com/asm-products/octobox/blob/master/public/views/content/file/modal.html
ng-view	145	https://github.com/TheWildHorse/MovieNight/blob/master/public/index.html
ng-options	142	https://github.com/mmautner/github-email-thief/blob/master/app/views/search_codes.html

Top usages in all repositories

Previous query was looking only at the sample of data. This query looks through all files accessible on github.

SELECT
  TOP(line, 500),
  COUNT(*) AS c
FROM (
  SELECT
    REGEXP_EXTRACT((SPLIT(contents.content, '\n')),
          r".*[^a-zA-Z](ng-[a-zA-Z0-9-]+).*") line,
    contents.id AS id
  FROM
    [bigquery-public-data:github_repos.contents] AS contents
  JOIN (
    SELECT
      path,
      id
    FROM
      [bigquery-public-data:github_repos.files]
    WHERE
      path LIKE '%.ng'
      OR path LIKE '%.html') AS files
  ON
    (contents.id == files.id)
  HAVING
    line CONTAINS "ng-");

Full results on google docs. Top 20 results:

directive	count
ng-click	1572920
ng-model	1355222
ng-show	962245
ng-repeat	697010
ng-if	601903
ng-controller	591669
ng-app	460875
ng-class	452863
ng-bind	283218
ng-hide	217121
ng-disabled	168468
ng-include	125913
ng-init	125508
ng-submit	118507
ng-switch-when	111254
ng-href	109513
ng-src	108365
ng-template	108241
ng-change	101197
ng-bind-html	89604

Custom directives

Methodology

I tried a heuristic for finding custom directives – extract all html tags and look at relative frequency in all html files vs “probably angular html” files.

“Probably angular html” is based on the assumption that “ng-” is ubiquitous in angular html, but not that frequent otherwise. Also some angular files use the .ng extension. This method is going to obviously have some false positives and negatives. Looking through results, 2.0 ratio was optimal. Here you can see top 50 results that were right past the edge of exclusion – ratio was between 2.0 and 2.5. Very few entries are legitimate.

I am again using the sampled sample_contents, since I ran out of free quota.

BigQuery query

SELECT
  tag,
  COUNT(1) / SUM(IF(probably_angular, 1, 0)) AS html_to_angular_ratio,
  COUNT(DISTINCT(sample_repo_name)) AS distinct_repository_count,
  CONCAT("https://github.com/",
      FIRST(sample_repo_name),
      "/blob/",
      REGEXP_EXTRACT(FIRST(sample_ref), r"refs/heads/(.*)$"),
      "/",
      FIRST(sample_path)) AS example_url
FROM (
  SELECT
    SPLIT(REGEXP_REPLACE(
        REGEXP_REPLACE(content, r"['\"\\\/\$]+[a-zA-Z-]*", ""), 
        r"[^a-zA-Z-]+", " "), " ") AS tag,
    (REGEXP_MATCH(content,
         r".*[ <]+ng-[a-zA-Z0-9-]+.*")
     OR (sample_path LIKE '%.ng')) AS probably_angular,
    sample_repo_name,
    sample_path,
    sample_ref
  FROM
    [bigquery-public-data:github_repos.sample_contents]
  WHERE
    (sample_path LIKE '%.html'
      OR sample_path LIKE '%.ng'))
GROUP BY
  1
HAVING
  html_to_angular_ratio < 2.0
ORDER BY
  distinct_repository_count DESC
LIMIT 1000;

Results

All results in google docs, including ng- entries.

For example, it found directives from ionic framework, bootstrap or ng-file-upload. Top 20 results excluding the entries that start with ng- or *angular*:

tag	html_to_angular_ratio	distinct_repository_count	example_url
translate	1.9265236377444466	729
novalidate	1.625	251	https://github.com/bonitasoft/bonita-ui-designer/blob/master/frontend/app/js/assets/asset-popup.html
ui-sref	1.1605095541401274	195	https://github.com/tatool/tatool-web/blob/master/app/views/doc/dev-executable-additional.html
orderBy	1.345821325648415	194
ui-view	1.2525252525252526	146	https://github.com/cityofasheville/simplicity-ui/blob/master/app/index.html
ion-content	1.4666666666666666	126	https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html
endbuild	1.6071428571428572	105	https://github.com/kwk/docker-registry-frontend/blob/v2/app/index.html
md-button	1.1246056782334384	98	https://github.com/deltaepsilon/quiver-cms/blob/master/app/views/address-dialog.html
ion-view	1.167785234899329	92	https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html
glyphicon-edit	1.8832684824902723	85
data-angularjs	1.736842105263158	76
md-content	1.1631578947368422	73	https://github.com/nozelrosario/Dcare/blob/master/www/views/vitals/trend.html
limitTo	1.1935483870967742	71	https://github.com/kwk/docker-registry-frontend/blob/v2/app/index.html
layout-align	1.065155807365439	70	https://github.com/nozelrosario/Dcare/blob/master/www/views/vitals/trend.html
ngIf	1.9211469534050178	69
md-icon	1.2275862068965517	61	https://github.com/radioit/radioit-desktop/blob/master/app/static/view/bangumi.detail.html
md-toolbar	1.1401869158878504	60
ion-list	1.7866666666666666	60	https://github.com/gaplo917/hkepc-ionic-reader/blob/master/www/templates/features/mypost/my.post.html

My other posts analyzing github using BigQuery

You may also take a look at my other posts:

6 Comments

says:

July 1, 2016 at 7:36 am

nice!

some comments:

“big query” -> “bigquery” (one word please! otherwise I won’t be able to find your contributions)

“TOP(line, 300)” -> fast, approximate results, cool

“FROM [bigquery-public-data:github_repos.contents]” -> so you are going for the full 1.7TB archive? This query gets you out of the free tier. That’s ok, if you or someone else is willing to pay for it :).

“WHERE path LIKE ‘%.ng’ OR path LIKE ‘%.html’) AS files” -> it might make sense to first extract this data to a new table. Then every new query can go over this table instead, and all querying will be cheaper thereafter. It’s a good investment 🙂

Thanks, added.

1. kozikow says:
  
  July 1, 2016 at 8:44 am
  
  Adding my comment:
  > “big query” -> “bigquery”
  Fixed.
  
  > At this size and type of analysis we are really pushing the boundaries of BigQuery.
  Github data set size, 3TB, is tiny comparing to some data sets Dremel is used at. I thought that BigQuery uses Dremel under the hood?
  Anyway, if you would end up overwhelming one Dremel worker (what was happening in my case), you would start hitting Dremel OOMs at much smaller data set size than 3TB. “Find out if data under some join key is too big to fit into single workers memory” may be not obvious – I couldn’t find any external docs that would suggest it.
  
  > That’s ok, if you or someone else is willing to pay for it:).
  I am using 300 USD free trial.
  
Pingback: 1 – Top angular directives used in GitHub repos
Pingback: Top pandas, numpy and scipy functions used in github repos | Robert Kozikowski's blog
Pingback: More advanced github code search – Robert Kozikowski's blog
Pingback: Visualizing relationships between python packages – Robert Kozikowski's blog

	hanale10 on Using PyCharm docker integrati…
	Dodge Coates on Very powerful data analysis en…
	Dodge Coates on Very powerful data analysis en…
	Dodge Coates on Very powerful data analysis en…
	kozikow on Very powerful data analysis en…

Top angular directives on github, including custom directives

Table of Contents

Introduction

Unique usage per repository in top repositories with example usage

Top usages in all repositories

Custom directives

Methodology

BigQuery query

Results

My other posts analyzing github using BigQuery

Published by kozikow

6 Comments

Table of Contents

Introduction

Unique usage per repository in top repositories with example usage

Top usages in all repositories

Custom directives

Methodology

BigQuery query

Results

My other posts analyzing github using BigQuery

Like this:

Published by kozikow

6 Comments