More advanced github code search


Documentation is not always enough and I like to search for code examples on github.

For example, today I wanted to search what different numpy functions people would pass to the custom aggfunc parameter in the pandas pivot_table.

When searching for things like “pivot_table aggfunc np”, the built-in github search does some kind of “OR” matching with some heurestic scoring. It sometimes returns lines only matching aggfunc or only pivot_table.

I extracted a BigQuery table {kozikow_github.data_py} that makes it easier to just search github for code examples. Each row have a field “line” with a line contents and an url to this line in github. It also have a prev_line and next_line contents.

You have a free 1TB of data scans on BigQuery, so you are very unlikely to run out of the quota by querying my table.

The table is targeted for scientific python, but it would be simple string replace of the SQL query to get the table for any other framework or language. Instructions how to generate your public table are at How to generate your own public table.


You can run the query at the BigQuery console, but you may need to register first.

FROM [wide-silo-135723:kozikow_github.data_py]
   REGEXP_MATCH(line, r"(?:pd|pandas).*pivot_table")
   AND (REGEXP_MATCH(line, r"aggfunc.*(?:np|numpy)"))
line url
meta_df = pd.pivot_table(meta_df, values=””reads””, rows=index, aggfunc=np.sum) “
temp = pd.pivot_table(temp, values=””reads””, rows=index, aggfunc=np.sum) “
fp = pd.pivot_table(f, values=’count’, index=’Entry Exit GroupID’, columns=’AgeEnteredBucket’, aggfunc=np.sum, fill_value=0) “
table =, rows=[‘week’], cols=[‘swimlane’], values=’count’, aggfunc=np.count_nonzero) “
print grid_to_string(pd.DataFrame.pivot_table(df, values=’Size’, index=[field + ‘ Method’ for field in fields], columns=[‘Compressor’], aggfunc=np.min)) “
transpositions = pd.DataFrame.pivot_table(df[df[‘Compressor’] = compressor], values‘Size’, index=[field + ‘ Codec’], columns=[field + ‘ Method’], aggfunc=np.min) “
printable_df = pd.DataFrame.pivot_table(df, values=’Size’, index=[field + ‘ Codec’, field + ‘ Method’], columns=[‘Compressor’], aggfunc=np.min) “
printable_df = pd.DataFrame.pivot_table(df, values=’Size’, index=[field + ‘ Codec’], columns=[field + ‘ Method’], aggfunc=np.min) “
#dfQB = pd.pivot_table(dfQB,index=[‘name’],aggfunc=np.sum).reset_index() “
dfs.append(index_converter(pd.pivot_table(x, index=index_field, aggfunc=np.sum), ‘sum_var’)) “
dfs.append(index_converter(pd.pivot_table(x, index=index_field, aggfunc=np.mean), ‘mean_var’)) “
dfs.append(index_converter(pd.pivot_table(x[x>0], index=index_field, aggfunc=np.sum), ‘sum_pos’)) “

I found a new numpy function, np.count_nonzero.

SQL generating the table

    LAG(line, 1) OVER (
        PARTITION BY file_id ORDER BY line_number) AS prev_line,
    LEAD(line, 1) OVER (
        PARTITION BY file_id ORDER BY line_number) AS next_line,
    POSITION(line) AS line_number,
            sample_path, "#L",
            STRING(POSITION(line))) AS url,
        id AS file_id,
        # Add a space after each line.
        # It is required to ensure correct line numbering.
        SPLIT(REPLACE(content, "\n", " \n"), "\n") AS line,
        content CONTAINS "import pandas"
        OR content CONTAINS "import numpy"
        OR content CONTAINS "import scipy"
        OR content CONTAINS "import scikit"
        OR content CONTAINS "import sklearn"))

It took 47 seconds to generate it.

How to generate your own public table

In BigQuery console, after creating and choosing the project:

  • On the left side bar, click dash near your project and click “Create new dataset”
  • Click dash near the new dataset and “Share Dataset”.
  • In Add People section add “All Authenticated users” -> view
  • In query window go to “Show options”
  • Choose a Destination Table (you can create a new one) and click “Allow Large results”
  • Press Run Query

Storage cost

Table is only 21 GB in size, so I will pay 42 cents per month to store it, 0.02$ per GB based on I am still using cloud storage free trial, so it’s still free.

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s