DATA HAWK
  • portfolio
  • Projects
    • National Day
    • 12 Days
  • blog
  • favorites
  • about

Automatic Data Retrieval and Archiving with Google Sheets

1/12/2018

3 Comments

 
Background: ​I recently had the idea that I wanted to track the Sixers likelihood to make the playoffs. The original plan was to create a model by testing factors such as strength of schedule (SOS) played, SOS remaining, current record, and others keeping those which were significant.  After doing some research I found BasketballReference.com has completed a similar exercise and created an award-winning model. This sparked a new question: could I automatically retrieve what BasketballReference is predicating each day and track how the Sixers’ odds change?
I determined there were three main things that I needed to do:
  1. Fetch the data
  2. Archive the data
  3. Automate the process

1. Fetching the Data

via GIPHY


​To create something hands-off I knew I would be calling on Google Sheets. I have used the Google’s IMPORTHTML() function in the past so this part was not too bad. Since the table is relatively fixed (there will not be new teams—rows – added) I was able to simply use the function twice to pull the data from the Eastern Conference table and the Western Conference table.
=IMPORTHTML("https://www.basketball-reference.com/friv/playoff_prob.html","table",1)
Picture
Table 1 from BasketballReference
 
​=IMPORTHTML("https://www.basketball-reference.com/friv/playoff_prob.html","table",2)
Picture
Table 2 from BasketballReference

I then added a few additional columns after the table. The one that will be important later was the date and time field. By using the function NOW(), when the data is retrieved there will also be a time stamp.  
Picture
Data from BasketballReference in Google Sheet using ImportHTML function
 
2. Archiving the Data

Now that I had the data and a timestamp when the data was pulled, I wanted to move this data to a new sheet so that I could add the results each day. Like Excel with macros, Google Sheets has a feature called Script Editor. I was able find a template to get me started by Ben Collins. In this example, he has a single cell returning a value from the web and created a script to copy the value to the bottom of a running list on the same sheet.

The basic premise is to use Google Sheets' variable (var) function to define the cells you wish to copy and use the appendRow() function to add the rows to the bottom of a table.
​
The important change I made to the original example was having the result write to a separate sheet than the original data was found. This is achieved by creating two variables, one defining where to source the data (in this case it is called "Sheet") and one to write the final data to (“Final”).
Picture
Next simply define which cells need to be copied form the original sheet using the getValue() function. Each new variable will be used in the append rows function to tell Google to copy the value in the reference cell. In the code below, the "sheet" after the equals sign is a reference to the variable we defined above, which allows it to know to get data from the sheet called "Raw" in the workbook.
Picture
​And with the cells defined, list all your variables from above in the appendRow() function. Note, the final.appendRows is referencing the variable we created above to indicate the data show be placed in the sheet called "Final":
Picture
When you run the script, Google Sheets will copy all the values from the specified cells into the next available row in your Final sheet. If you have multiple rows in your source data, like I did, you will need to copy the entire function and change the row number in your script to reference the next row in the Google Sheet (for example change Raw!A2 to Raw!A3. Code Snapshot for one row:
Picture
3. Automating the Process

​Now let’s make it run by itself. This is fairly easy. We first need to automate the IMPORTHTML function. By creating the following code in the script editor, Google will simply place the what is defined as "cellFunction" in the cells specified (in this case A1 and A18). By placing the function in the cell, it will cause the function to refresh with new data from the site.
Picture
​Still in your Script Editor, under the Edit Menu select Current Project Triggers. Here you can tell Google Sheets to run a function automatically based on a "tigger", for example, a certain time (monthly, weekly, daily, hourly, etc.). Note, I made sure to set my getData function to run before the saveData function. 
Picture
​The result? Each day I will have a nice data table that ran overnight with the updated NBA playoff picture. 
Picture
​Time to connect this worksheet to Tableau and get vizzing! I look forward to seeing how you use this function!

Thanks for reading. Questions? Feel free to reach out on Twitter @coreyj34.
​
Corey
3 Comments
William Guicheney
1/16/2018 03:01:05 pm

Hi Corey!

I wanted to stat by thanking you for this great guide, I've been looking for a way to store some data online for free to bring straight into Tableau. This is great!

The only issue I am having is that Google prompts with a "saveData needs your permission to use your data", which then leads me a google error saying the app isn't verified. As I only plan to use this for myself, and this isn't an app, I can't go through the verification process.

How did you get around this?

Cheers,
Will

Reply
Corey
1/16/2018 08:08:32 pm

Hi Will.

Thanks for reading and the kind words. I believe I got the same message and was able to continue after saying allow the app. Is there an option to expand the error message and then select allow or trust? Let me know if you still continue to have issues and I will try to recreate it.

Thanks,
Corey

Reply
Saurabh Jain
11/12/2019 04:55:51 pm

Hi Corey Can you provide the link for the entire script. That way I can check for any errors that I have made in the script




Leave a Reply.

    Author

    Corey Jones  | @coreyj34
    Data Viz Enthusiast 
    Philadelphia Area
    ​coreyjones012@gmail.com


    Archives

    December 2018
    November 2018
    October 2018
    January 2018
    June 2017
    April 2017
    December 2016
    November 2016
    September 2016
    July 2016

    Categories

    All
    Instructional

    RSS Feed

Powered by Create your own unique website with customizable templates.
  • portfolio
  • Projects
    • National Day
    • 12 Days
  • blog
  • favorites
  • about