Unlock the Value of Your Data
With the recently launched new version 2.0 of their dataclips feature, Heroku took the next step in its support of an ongoing theme:
"Unlock the Value of Your Data". Much like GitHub Gists support the sharing and collaboration on relevant snippets of code, dataclips do the same for your data.
Nowadays it is a common conception that the value is less in individual applications but rather in the data they consume and produce. Exposing that data publicly is a big statement in terms of openness and support of a growing number of services that mash-up existing datasources to create something bigger than the sum of its parts.
The Heroku Postgres team worked almost a year on overhauling the dataclips architecture and implementation. InfoQ took the opportunity to speak to the product manager of the persistence group at Heroku Craig Kerstiens. This article outlines many aspects and features of dataclips that are valuable not only to developers but also to businesses and decison makers.
Craig stated:
An organization's data is its most valuable asset. Unfortunately, that data is usually trapped inside a database with few ways to access it by a privileged handful of people. Too often reports are manually generated and their results pasted into emails; dashboards get built but rapidly become outdated and never answer the right questions.
That thought caused Heroku to rework the existing dataclips offering into something that is easier to setup and use and also makes way for many future uses.
Example - Health Inspection Scores SFO
To give an example how a dataclip looks like, we got some that were developed by Code for America the Clips represent health inspection scores for San Francisco restaurants and food establishments. Code for America gathered and aggregated publicly available data and dataclips are one way to share them.
For instance the "Most recent score for business"
Dataclips can be created directly from your account at dataclips.heroku.com or your Postgres management page by supplying an appropriate SQL statement. The only limit is that a dataclip can only return 30.000 rows, excess data is cut off.
Dataclips are available as embeddable iframes as shown above, but also as standalone HTML pages. From that page one can see the SQL, version and revision history of the dataclip and share the data via email, Twitter or Google docs. By appending one of the file extension CSV, XLS (Excel) and JSON to the URL, Dataclips can be downloaded in any of those formats.
JSON
https://dataclips.heroku.com/aniexnddtuqpmtjhmuvdgrqprjns.json {"fields":["name","address","city","score"], "values":[ ["MINI BAR SF, LLC","837 DIVISADERO ST ","San Francisco","100"], ["AT&T - MAIN KITCHEN/SUITE LEVEL [145084]","24 WILLIE MAYS PLAZA 4.10.03 ","San Francisco","100"], ["CYBELLE'S PIZZA","719 14TH ST ","San Francisco","100"], ... ]}
CSV - comma separated values
https://dataclips.heroku.com/aniexnddtuqpmtjhmuvdgrqprjns.csv name,address,city,score "MINI BAR SF, LLC",837 DIVISADERO ST ,San Francisco,100 AT&T - MAIN KITCHEN/SUITE LEVEL [145084],24 WILLIE MAYS PLAZA 4.10.03 ,San Francisco,100 CYBELLE'S PIZZA,719 14TH ST ,San Francisco,100 PACIFIC UNION CLUB,1000 CALIFORNIA ST ,San Francisco,100 ...
Other formats are potentially on the roadmap, examples are xml and yaml. Yet something like the google-data-table format would be great for integration in the variety of google charts and diagrams. Currently the CSV format can be imported into a Google Calc spreadsheet using the ImportData(URL)
function which is refreshed hourly.
Revisions and Versions
Like gists dataclips are versioned and have revisions. Every time the query is changed a new revision of the dataclip is created (also listed on the dataclip-page), for every changed result (due to changing data) a new version is issued. Access to different revisions and versions is possible by appending quey parameters ?revision=1&version=5
to the URL. Much like gists, dataclips can also be forked and then worked on independently.
Implementation
The implementation of dataclips is quite straightforward. Dataclips are served by ruby applications running on heroku infrastructure, which run the configured query in a readonly transaction in regular intervals (currently around once a minute) and capture the results. The script monitors runtime and query impact to dynamically adjust frequency. Results of the query are stored in a versioned table for quick access independend of the original database. The additonal formats are created and cached in S3. The HTML views are not yet cached but rendered on the fly.
Usage and Feedback
Speaking about how dataclips are meant to be used, Craig pointed out a number of use-cases. For once, dataclips are self-updating and versioned information radiators for decision makers. For developers it becomes easy to share data. Dataclips are used as stable APIs to write prototypes or even full application against. In general they offer the ability to open up data for mash-ups and unexpected uses. Another interesting use-case is to allow people to learn SQL with dataclip, just fork one and edit the SQL to return something else.
On the roadmap for dataclips there are many dimensions. Obvious additions like supporting more formats or offering styling for the HTML views. Enabling other relational databases would be easy, offering dataclips integration to the NoSQL addon-providers could create an really interesting movement but is of course more involved. Another interesting direction would be to add social features to dataclips like comments, ratings which could eventually lead to a marketplace for dataclips (both free and paid). Custom URls are not yet planned. Using dataclips more frequently would also mean that there will be an API to create them programmatically.
The feedback from users so far was very positive, some interesting use-cases for dataclips were visualization of fraud-detection and user signups as well as usage as datasources for dashboards.
For commercial plans of the Heroku Postgres offering additional features are available - like securing a dataclip to a Heroku user-account. Security at the free level shall be provided by unguessable URls.