Afterquery: the json post-processor and graphing tool

What does it do?

Afterquery is a pure-client-side javascript tool that downloads jsonp-formatted data from a given URL, applies a configurable series of transformations, and then renders the result as either a data table or a Google Visualizations (gviz) chart.

Although the javascript file happens to be hosted on a server, your data never gets uploaded; your browser handles all the processing steps internally. Also, the permissions to download the jsonp data depends on the cookies in your browser, so you can safely retrieve protected content without granting authorization to an external server.

You can then embed it in an iframe to produce easy dashboards like this:

(That's actually a live graph generated by afterquery just now.)

I wrote afterquery because I found that there are lots of cool tools for extracting data from super huge databases (so-called “big data”) but those tools can take between a few seconds and a few minutes to run. Once you have a smallish dataset produced by running your “big data” query tools, it would be nice to be able to shrink, summarize, and rotate the “not big data” as efficiently as possible. That's this.

How to use

First, you need to find a service that produces data in “rows and columns” jsonp format. That is, the structure of the object is [row1,row2,...,rown] where each row is [col1,col2,col3,...,coln].

Then you construct a afterquery URL with the jsonp URL you want to retrieve from, followed by the set of transforms you want to apply. You can just type it into your web browser's URL bar while you experiment. Then, once you have the query the way you want it, paste it into a web page inside an <iframe> tag (like the graph above) or just make a hyperlink to a full-screen chart.

The basic structure is:<transform1>&<transform2>&<transform3>...

(The default with no transforms is to just show the data in a handy table view without any changes.)

Available transforms:

  • &limit=n
    Discard all data after n rows.

  • &filter=key>=value1,value2,...
    Show only rows where the column named key has a value >= value1 or value2 etc. The operator (>= in this example) can be one of =, <, >, <=, >=, <>, or !=. If you specify more than one value, they are combined in an OR configuration (ie. a row matches if any of the values match). If you provide more than one &filter= clause, they are combined successively in an AND configuration (ie. a row matches only if all the filters are true).

  • &q=value1,value2,...
    Show only rows where any of the columns contain the substring value1 or value2 etc. If more than one value is specified, they are combined in an OR configuration. If you provide more than one &q= clause, they are combined successively in an AND configuration.

  • &order=[-]key1,[-]key2
    Sort the table in order of key1 and then (if key1 is the same) by key2 etc. If a key starts with ‘-’ that means to sort that key in descending order (largest to smallest).

  • &extract_regexp=key=regexp(match)regexp
    Search for the given regular expression in each row in the column named key. If it's found, replace the column with the substring in the (match) part of the regexp. For example, &extract_regexp=ver=version-(.*) would replace a string version-25.4 in column ver with the string 25.4.

  • &group=key1,key2,...;val1,val2,...
    Summarize the table data by summing and counting. This operation works like a simplified version of SQL's “group by” clause. In the resulting output, the order of the columns will be key1,key2,...,val1,val2... and there will only be at most one row with any particular combination of key1,key2,... fields. The val columns will be summed (if they were already numerical) or counted (if they were non-numeric).

    A clause like &group=a,b;x,y (if x is a string and y is a number) would be equivalent to this in SQL: select a, b, count(x) x, sum(y) y from MyTable group by a, b

    If you apply multiple &group= clauses, it works like using multiple nested subqueries in SQL. (That is, the output of one &group= clause becomes the MyTable of the next one.)

    If you leave out the ‘;’ and the val column names, the default is to automatically include all columns other than the key columns.

    If you include the ‘;’ but leave out the val column names, that means you don't want any value columns (so only the key fields will be included, and nothing will be summed or counted at that step). So &group=a,b; (with a trailing semicolon) is equivalent to this in SQL: select a, b from MyTable group by a, b.

  • &pivot=rowkeys...;colkeys...;valkeys...
    A &pivot= clause works like a &group= clause, but creates a pivot table. Pivot tables are a bit complicated; the easiest way to learn about them is to play with an example. Here's one to start with.

    The simplest way to think of a pivot table is like this: the values originally in the columns named by rowkeys end up down the left of the output table. The values originally in the columns named by colkeys end up as headings across the top of the output table. The values originally in the columns named by valkeys end up as values in the body section of the output table. A pivot table is very handy when you have raw data in SQL-like format and you want to rearrange it to be suitable for charting (where each line in a line chart, say, is usually one column of the table).

    If the rowkeys section is empty, the output will have exactly one row (with all the value fields counted or summed into that one row). If the colkeys section is empty, the &pivot= operation is essentially equivalent to a &group=rowkeys...;valkeys... operation. If the valkeys section is empty, there are no values used to calculate the table body, so it is equivalent to an &group=rowkeys...; operation.

  • &chart=charttype
    Instead of showing a table of values, show a chart. The available charttypes are currently: stacked (a stacked area chart), line, column, bar, pie.

  • &title=title
    Add a title to the chart.

Example 1

Here is some raw data (source) produced by an analytics program:

Afterquery can render it as a table like this (link):

Or pre-sort the table for you like this (link):

Or filter it by date (link):

Or summarize the results (like a “group by” in SQL) (link):

Or summarize and display only a subset of columns (link):

Or do a pivot table (converting rows into columns) (link):

Or filter, then pivot, and then make a chart! [(link)](/?url=example2.json&order=date,state&filter=date>=2012-11-01&filter=date<2012-11-14&pivot=date;state;NumDevices&chart=stacked&title=Devices Rebooted/Upgraded by Date)

Example 2

Here's another dataset (link):

We can use two consecutive grouping operations to first get a list of serial numbers for each version, and then get the count of serial numbers per version (link):

Hmm, those version numbers are ugly because some of them have extra debug information after them. Let's trim it out using a regex (link):

And make a pivot table to easily show the pattern over time (link):

Trim out some outliers (link):

And graph it (link):

Or graph a subset of the data (link):

Or maybe show the top 4 versions (link):

Where can I get a jsonp data source?

There are lots of them out on the web. If your favourite database query or reporting engine doesn't support jsonp, ask them to add it!

Need help? Want to contribute?

Email Probably there'll be a mailing list eventually.

The complete source code is contained in render.js.