Friday 3pm, Cailen and I had a discussion regarding the difficulties around obtaining an API key for Instagram and whether it was possible to work around the restrictions. On a whim, I decided to have a look and try and scrape the information which proved to be easy enough. After talking about whether the information would be useful for the Big Data group, we realised that it might be fairly easy to develop a scraper to store all the data. We also remembered that GovHack was starting in 2hrs and so, our entry was formed.
After convincing Cailen that it will be "fun", I press ganged Matthew Bourgeois and Steffi Tan into challenging ourselves to hack some useful information out of social media data in 48 hours. We transferred my original scraping server from PHP to Node and started to store all the scraped data into a Mongo database. With the scraping done, it came to the fun part: what do we want to do from here?
TourAlytics works with a few major principles and they are:
- Initially collect data with the hashtag #goldcoast
- From here, start looking at where users are coming from and discard posts that are from locals of the City of Gold Coast
- Collect all the hashtags and journey information from the remaining posts and store in our local database
- While this is happening, identify the top 10 hashtags from the past 24hrs from the hashtag collection and repeat steps 1-3 for each hashtag
- Repeat steps 1-4 while our server is on
This essentially allows TourAlytics to identify trends in social media posts and events and adapt so we can continue collecting relevant data. Using this data, we could extract the following information:
- Which hashtags are commonly used by tourists to the City of Gold Coast
- Where users are posting from and the contents of the post
- The overall journey of a tourist from when they arrive to when they leave
The data also creates further opportunities for sentiment analysis and image analysis to better understand what people are doing and their sentiments about their activities. While we weren't able to completely implement this during our 48 hrs, we do see value in pursuing this with future development.
Using the venue data from the Commonwealth Games, we can further overlay this and derive relationships between the movement of people, their sentiments and their activities during the Commonwealth Games. This is something that could provide valuable insight and prediction for crowd management during the event.
One use case is the case of public bathrooms which are often crowded and quickly deteriorate in quality. Event staff are only able to know that there's a problem at the end of the day but by monitoring publicly available social media sentiment, a system like TourAlytics can quickly notify event managers of potential issues before they occur.
I cannot thank the following people enough
- Cailen for his ever valuable software development skills
- Steffi for implementation and design of our interactive charts and analytics
- Matthew with his Google maps implementations
- Steffi for the footage, editing and script writing so that we had something to put online
Our GovHack 2017 video entry is at:
You can have a look at the hack over at TourAlytics.
This year, our goal wasn't to win any awards but to focus on making something using the experience and knowledge we have right now and to challenge ourselves to grow. We were able to do just that and came up with something cool to show for it.
Congratulations and many thanks again to everyone who tagged along to my whimsical GovHack entry.