The other day in the SQL Community Slack channels, we started chatting about general usage of Slack – which channels saw the most activity, what topics were discussed, and so on. One thing led to another and I wondered if I could put that into a report for analysis. While I realized that you can’t get any sort of “top user” based on the extract I used, it was still an interesting exercise.
I went through a couple of steps to get to a usable set of data:
Import the Slack JSON data
Replace the user ids with usernames – mostly for better word clouds
Import the data into PowerBI and play with some visualizations
Here are the final results for those who might be curious:
Importing the Data
This was one of the trickier parts, but I got some help from Constantine Kokkinos ( b | t )with a PS script to get me started. I tweaked that a bit to use OPENBULK and OPENJSON commands to parse out the fields I wanted from the JSON files. That went well until I realized I also needed the Date of the file in order to have any useful analysis because Slack Timestamps don’t contain the date, just the time. I also realized that OPENJSON and the “WITH” command require you to use the field’s case as defined in JSON. After a bunch of playing with the PS and SQL code, I tweaked Constantine’s code to something that worked and let me pull in the message text for word analysis.
Massaging the Data
It’s all well to get this into a table, but at that point, I realized all of the times were still in the Slack TimeStamp format and the Message Text used Slack’s UserID anytime someone was mentioned instead of their username. That makes sense as usernames can change or even be duplicated within a group, but made for word clouds that looked a little strange.
The following script creates the objects I used to store the Slack data, creates a full text index on the MessageText column to make finding user ids faster, imports some basic User info, and has the script to update the userids after the messages are imported.
Slack Usage Analysis
Now that this was all imported and somewhat cleansed, I was ready to do some basic analysis. I started a new PowerBI Desktop file, imported the data in the Messages table, and threw up some visualizations. I had to install the Word Cloud extension from the Marketplace, then exclude several noise words beyond the defaults. This included words such as “joined” and “channel”, “
`” (three backticks – used to denote a text block in Slack), and a handful of integers that occur regularly. I left the usernames intact as they only show when someone is mentioned directly in the channel.
Visualization isn’t my strong suit, but I’ve obtained permission from the admins to share the files used for this in case someone wants to play around with the data directly. You can download the files here. Have fun playing around and let me know what you think.