BigQuery

Interactive Dashboards with Data Studio & Bigquery

Interactive Dashboards with Data Studio & Bigquery

The US Federal Elections Commission (FEC) has been publishing Political Campaign Finance data for years, but they haven’t made it easy to explore their data. Luckily, Google Data Studio and BigQuery are making it easier than ever to explore large datasets like the ones published by the FEC.

In this article, you will learn how to use Google’s latest tools to create your own dashboards, using an example I created to visualize 2016 Election Cycle Donations, exploring public FEC data curated by OpenSecrets.org.

Since there are already a bunch of tutorials for creating charts and filters in Data Studio, we are going to take a small shortcut by making a copy of the original 2016 Election Cycle Donations dashboard, but we are not totally cheating because we have a fair amount of other stuff to do. Here’s a quick overview of the game plan:

  1. Create a Bigquery account (3 Minutes)
  2. Add the public FEC dataset to your Bigquery account (30 Seconds)
  3. Make a copy of the 2016 Election Cycle Donations Dashboard (30 Seconds)
  4. Create Custom Dimensions and Metrics (5 Minutes)
  5. Fix Broken Dashboard Widgets (3 Minutes)
  6. Explore the data!

1. Create a Bigquery account

Skip to step 2 if you already have a Bigquery account.

You can create a Bigquery account by clicking here. This link is actually a special link that will guide you through the process of creating a Bigquery account, and it will automatically add the FEC data you will need in the next step. Once you create your new Bigquery account, you can skip to step III.

2. Add the public FEC dataset to your Bigquery account

Skip to step 3 if you just created a Bigquery account using the link in Step 1.

If you already have a Bigquery account, you can click this link to add the public FEC dataset to your account. Once the fh-bigquery dataset has been added successfully, you will see it in the sidebar navigation menu.

BigQuery sidebar

3. Make a copy of the 2016 Election Cycle Donations Dashboard

Return to the 2016 Election Cycle Donations dashboard and click File > Make a Copy > Create New Data Source

Data Studio new data source

Then click Create a New Data Source and select Bigquery > Shared Projects > enter fh-bigquery under Shared project name> opensecrets > indivis16v2 > your Bigquery project of choice > Connect.

BigQuery data source

Click the Add to report button on the next screen (we will update columns later)

Connect data source

The indivis16v2 BigQuery dataset is now connected so you can click Create Report on the next screen.

Data source connection

4. Create Custom Dimensions and Metrics

The copied version of the 2016 Election Cycle Donations dashboard will show many broken widgets at first. This is because the original version was using calculated fields that Data Studio wasn’t able to copy. We’re going to re-create the fields used in the original Dashboard so we can fix the broken widgets. This part looks like a lot of steps, but it is much quicker and easier than it looks. All we are really doing here is adding new columns and then pasting each name and/or formula we need to define those columns. In some cases, we will just rename the columns instead of adding new ones.

Calculated fields

Create new Custom Dimension for Political Affiliation

  • Name: Political Affiliation
  • Formula: CASE WHEN party = ‘D’ THEN ‘Democrat’ WHEN party = ‘R’ THEN ‘Republican’ WHEN party = ‘L’ THEN ‘Labor Union’ WHEN party = ‘I’ THEN ‘Individual’ WHEN party = ‘C’ THEN ‘Corporation’ WHEN party = ‘3’ THEN ‘Other’ ELSE ‘Not Provided’ END

Create new Custom Dimension for Donor Gender

  • Name: Donor Gender
  • Formula: CASE WHEN REGEXP_MATCH(gender, “M|m”) THEN “Men” WHEN REGEXP_MATCH(gender, “f|F”) THEN “Women” ELSE “Not defined” END

Create new Custom Dimension for Donor Occupation

  • Name: Donor Occupation
  • Formula: UPPER(occupation)

Create new Custom Metric for Total Donations

  • Name: Total Donations
  • Formula: COUNT(fectransid)

Create new Custom Metric for Total Donors

  • Name: Total Donors
  • Formula: COUNT_DISTINCT(contribid)

Create new Custom Metric for Avgerage Donations

This metric uses the custom metrics Total Donations and Total Donors, so those must be created first.

  • Name: Avg. Donations
  • Formula: Total Donations / Total Donors

Create new Custom Metric for Avg. Amount

  • Name: Avg. Amount
  • Formula: amount / Total Donations
  • Create column and change to currency format

Create new Custom Metric for Total Donation Recipients

  • Name: Total Donation Recipients
  • Formula: COUNT_DISTINCT(recipid)

Renaming default columns for more clarity

The following steps do not require custom formulas, we are just going to update the names of a few columns.

  • Rename “contrib” to “Donor”
  • Rename “recipients” to “Donation Recipients”
  • Rename “orgname” to “Donor Employer”
  • Rename “state” to “Region” and change format to Geo > Region
  • Rename “city” to “City” and change format to Geo > City
  • Rename “amount” to “Total Amount” and change to currency format

5. Fix Broken Dashboard Widgets

Now that we’ve re-created the custom columns that were used in the original dashboard, we just need to add them to the broken widgets and we are done. Luckily, any Report Level widget just need a single update to work across every page but there are a few Page Level widgets that need to be updated on each page.

Fix Filters (Report Level widgets)

Let’s assign the following dimensions to each filter so the filters’ names match the layout of the original Dashboard. Use “Total Amount” as the metric for each filter.

Data Studio filters

  • Filter 1: Donation Recipients
  • Filter 2: Political Affiliation
  • Filter 3: Donor Gender
  • Filter 4: Donor Occupation
  • Filter 5: Employer Donor
  • Filter 6: Region

Fix Charts (Report Level widgets)

Update the 3 charts listed below by assigning the following dimensions to each corresponding widget.

Data Studio charts

Stacked bar chart

  • Add Political Affiliation as first dimension
  • Add Donor Gender as secondary dimension
  • Use “Total Amount” for metric

Time Series chart

  • Add “Total Donations” as first metric
  • Add “Total Donors” as second metric
  • Add “Total Amount” as third metric

Region Map Chart

  • Add “Region” Dimension

Fix Scorecards (Report Level widgets)

Replace the invalid metric for each scorecard with the following metrics.

Data Studio scorecards

  • Total Donation Recipients
  • Total Donors
  • Total Donations
  • Avg. Donations
  • Avg. Amount
  • Total Amount

Fix Table Widgets

Unlike the report level widgets we just fixed, the Table widgets on each page contain a unique set of columns. The dimensions and metrics for each Table are listed below.

Data Studio tables

Table on Page 1: Recipients

  • Dimensions: Donation Recipient, Political Affiliation
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 2: Donors

  • Dimensions: Donor, Donor Occupation, Donor Employer, Political Affiliation, Donation Recipient
  • Metrics: Total Donations, Avg. Amount, Total Amount

Table on Page 3: Political Affiliation

  • Dimensions: Political Affiliation
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 4: Gender

  • Dimensions: Gender
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 5: Occupation

  • Dimensions: Donor Occupation
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 6: Employer

  • Dimensions: Donor Employer, Political Affiliation
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 7: Region

  • Dimensions: Region
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

Table on Page 8: City

  • Dimensions: City
  • Metrics: Total Donors, Total Donations, Avg. Donations, Avg. Amount, Total Amount

6. Explore the data!

That’s it! All of the widgets across all of the pages should work now, so you are free to explore or to continue creating new pages and finding new insights.

Data Exploration

When you think about it, we did not have to write a single line of code, we did not have to learn SQL, and we did not have to index a database or learn how to allocate computational resources… we really didn’t have to do any of the typical scary stuff that usually comes with large scale data analysis! Thanks to Data Studio and BigQuery, almost anyone can now create fully interactive dashboards to explore datasets of virtually any size, and it can be done in just minutes.

By the way, you may have noticed a bunch of other public datasets in BigQuery while we were adding the fh-bigquery dataset. Well guess what, you can actually create Data Studio dashboards with any of those datasets too. You can even combine multiple datasets by creating a Data Source from a query that joins multiple tables together.

As some parting inspiration I would recommend checking out this post by Felipe Hoffa, a Google Developer Advocate. His video about visualizing big money in politics with Big Data has some particularly great nuggets that you can use to take your dashboards to the next level of the game!

image 
BigQuery sidebar
Data Studio new data source
BigQuery data source
Connect data source
data source connection
Calculated fields
Data Studio filters
Data Studio charts
Data Studio scorecards
Data Studio tables
Data Exploration


Online Behavior – Marketing Measurement & Optimization

Funnel Analysis with Google Analytics Data in BigQuery

Funnel Analysis with Google Analytics Data in BigQuery

Conversion funnels are a basic concept in web analytics, and if you’ve worked with them enough, you may have gotten to a point where you want to perform a deeper analysis than your tools will allow.

“Which steps in my funnel are being skipped? What was going on in this funnel before I defined it? Which user-characteristics determine sets of segments across which progression through my funnel differs?” These questions can be answered using the solution described in this article. In particular, I’m going to talk about how to use BigQuery (BQ) to analyze Google Analytics (GA) page-hit data, though the principles could be applied to any page-hit data stored in a relational database.

The Google Analytics Funnel Visualization report (see below) makes certain abstractions and has certain limitations, and advanced users can benefit through the use of Google BigQuery (BQ) – an infrastructure-as-a-service offering which allows for SQL-like queries over massive datasets.

Funnel Analysis

In this article, we’ll discuss the benefits of using BigQuery for funnel analysis as opposed to the Google Analytics user interface. In order to make the solution clear I will go over the basic structure of an SQL query for funnel analysis and explain how to use Funneler, a simple Windows application to automate query-writing. The source code of Funneler is also provided as a Python 3 script. Please note that in order to use the specific examples provided here you will need a Google Analytics Premium account linked to BigQuery (learn more about the BigQuery Export feature).

Funnel Analysis – Google Analytics UI vs. BigQuery

The solution I propose below works as follows: using a Windows application (or Python script) a BigQuery-dialect SQL query is generated which tracks user-sessions through a set of web properties, and optionally segmenting and/or filtering the sessions based on session characteristics. BigQuery’s output is a table with two columns per funnel stage: one for session-counts, and one for exit-counts.

Below is a list of the most significant differences between GA Funnel Visualization and the solution I will be discussing.

  1. Loopbacks: If a user goes from steps 1 -> 2 -> 1, GA will register two sessions: one which goes to step 1, one which goes to step 2, and an exit from step 2 to step 1. Our query will only count one session in the above instance: a session which goes from step 1 to step 2. Furthermore, since progress through the funnel is measured by the “deepest” page reached, the above scenario will not be distinguished from a session which goes from step 1 -> 2 -> 1.
  2. Backfilling funnel steps: GA will backfill any skipped steps between the entrance and the exit. This solution will only register actual page-hits, so you’ll get real numbers of page-hits.
  3. Historical Information: GA Funnels cannot show historical data on a new funnel, whereas this workflow can be used on any date range during which GA was tracking page-hits on the selected funnel-stage pages.
  4. Advanced Segmentation: GA Funnels don’t support advanced segmentation, whereas with Group By clauses in BigQuery, you can segment the funnel on any column.
  5. Sampling: GA Funnel Visualization shows up to 50,000 unique paths, whereas BQ will contain all the page-hits that GA recorded, and allow you to query them all.

The Query

For Google Analytics data, the basis of a funnel query is a list of URLs or Regular Expressions (regex), each representing a stage in the conversion funnel.

If you have a pre-existing funnel in GA, follow the steps below to find your funnel settings:

  1. Go to Admin in GA
  2. Select the correct Account, Property, and View
  3. Go to Goals
  4. Select a Goal
  5. Click Goal Details

In this screen you will find a regex or URL for each step of the funnel. They may look like this: “/job/apply/”.

The basic process of writing the query, given the list of regexes or URLs, is as follows:

1. Create a base-level subquery for each regex

For each row which has a regex-satisfying value in the URL column, pull out fullVisitorId and visitId (this works as a unique session ID), and the smallest hit-number. The smallest hit-number just serves as a non-null value which will be counted later. The result sets of these subqueries have one row per session.

SELECT
fullVisitorId,
visitId,
MIN(hits.hitNumber) AS firstHit
FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('YYYY-MM-DD'),
TIMESTAMP('YYYY-MM-DD'))
WHERE
REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
AND totals.visits = 1
GROUP BY
  fullVisitorId,
  visitId

2. Join the first subquery to the second on session ID

Select session ID, hit-number from the first subquery, and hit-number from the second subquery. When we use full outer joins, we’re saying sessions can enter the funnel at any step. To count sessions at each stage that have only hit a previous stage, use a left join.


SELECT
  s0.fullVisitorId,
  s0.visitId,
  s0.firstHit,
  s1.firstHit
FROM (
# Begin Subquery #1 aka s0
  SELECT
        fullVisitorId,
        visitId,
        MIN(hits.hitNumber) AS firstHit
  FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('2015-11-01'),
TIMESTAMP('2015-11-04'))
WHERE
      REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
        AND totals.visits = 1
GROUP BY
      fullVisitorId,
      visitId) s0
# End Subquery #1 aka s0
FULL OUTER JOIN EACH (
# Begin Subquery #2 aka s1
SELECT
    fullVisitorId,
    visitId,
    MIN(hits.hitNumber) AS firstHit
  FROM
TABLE_DATE_RANGE([<id>.ga_sessions_], TIMESTAMP('2015-11-01'),
TIMESTAMP('2015-11-04'))
WHERE
REGEXP_MATCH(hits.page.pagePath, '<regex or URL>')
  AND totals.visits = 1
GROUP BY
      fullVisitorId,
      visitId) s1
# End Subquery #2 aka s1

ON
  s0.fullVisitorId = s1.fullVisitorId
  AND s0.visitId = s1.visitId

3. Join the third subquery to the result of the above join on session ID

Select session ID, hit-number from the first subquery, hit-number from the second subquery, and hit-number from the third subquery.

4. Join the fourth subquery to the result of the above join on session ID

Select session ID, hit-number from the first subquery, hit-number from the second subquery, hit-number from the third subquery, and hit-number from the fourth subquery.

5. Continue until all subqueries are joined in this way

6. Aggregate results

Instead of a row for each session, we want one row with counts of non-null hit-numbers per funnel-step. Take the query so far, and wrap it with this:

SELECT
  COUNT(s0.firstHit) AS _job_details_,
  COUNT(s1.firstHit) AS _job_apply_
FROM (
  (query from 2. goes here if the funnel has two steps))

The query has a recursive structure, which means that we could use a recursive program to generate the query mechanically. This is a major advantage, because for longer funnels, the query can grow quite large (500+ lines for a 13-step funnel). By automating the process, we can save lots of development time. We’ll now go over how to use Funneler to generate the query.

Funneler

Funneler is an executable Python script (no need to have Python installed) which, when fed a json containing a list of regexes or URLs, generates the SQL query in the BigQuery dialect to build that funnel. It manipulates and combines strings of SQL code recursively. It extends the functionality of the query described in section 2 and it allows for segmenting and filtering of sessions based on any column in the BigQuery table.

Funneler and funneler.py can be found on my Github page (https://github.com/douug).

The input to Funneler is a json document with the following name/value pairs:

  • Table name, with the following format: [(Dataset ID).ga_sessions_]
  • Start date: ‘YYYY-MM-DD’
  • End date: ‘YYYY-MM-DD’
  • List of regexes: one regex per funnel-step
  • Segmode: True for segmenting, False otherwise
  • Segment: The column to segment on
  • Filtermode: True for filtering, False otherwise
  • Filtercol: The column to filter on
  • Filterval: The value to filter on in the above-mentioned column

Here is an example of an input json:


{
  "table": "[123456789.ga_sessions_]",
  "start": "'2015-11-01'",
  "end": "'2015-11-04'",
  "regex_list": ["'/job/details/'",
        "'/job/apply/'",
        "'/job/apply/upload-resume/'",
        "'/job/apply/basic-profile/'",
        "'/job/apply/full-profile/'",
        "'/job/apply/(assessment/external|thank-you)'"],
  "segmode": "True",
  "segment": "device.deviceCategory",
  "filtermode": "False",
  "filtercol" : "hits.customDimensions.index",
  "filterval" : "23"
}

Please note the quoted quotes (e.g. in the elements of the value of the key "regex_list" above). These are included because after the json is ingested into a Python dictionary, the Python strings may contain SQL strings, which themselves require quotes. But, the value of the key "filterval" has no inside quotes because 23 is of type int in SQL and wouldn’t be quoted.

To run Funneler, go to \dist_funneler\data. Open input.json and modify the contents, then go back to \dist_funneler and run funneler.exe. Three files should appear – std_error.log, std_out.log (which contains feedback about whether Segmode or Filtermode are engaged, and where the generated query can be found), and your query. Copy and paste your query into BigQuery. Try starting with a short funnel, as it may take a few tries to format the input correctly.

Alternatively, if you are running funneler.py, it can be executed from the command line with the following:

python funneler.py input.json

In this case, the contents of the above mentioned std_error.log and std_out.log files will appear in-console. This query can then be copied into your BQ instance. The resulting table should have two columns per regex/funnel-step – one for hits, and one for exits – and one row . If segmode is set to True, then there will be a row per value in the segment column.

Hopefully these tools help you to quickly create complex queries and meet analysis objectives to perform deeper analysis of GA page-hit data.

image 
Funnel Analysis


Online Behavior – Marketing Measurement & Optimization