Analyzing Slack Traffic with PowerBI
The other day in the SQL Community Slack channels, we started chatting about general usage of Slack – which channels saw the most activity, what topics were discussed, and so on. One thing led to another and I wondered if I could put that into a report for analysis. While I realized that you can’t get any sort of “top user” based on the extract I used, it was still an interesting exercise.
I went through a couple of steps to get to a usable set of data:
-
Import the Slack JSON data
-
Massage it
-
Replace the user ids with usernames – mostly for better word clouds
-
Import the data into PowerBI and play with some visualizations
Here are the final results for those who might be curious:
Importing the Data
This was one of the trickier parts, but I got some help from Constantine Kokkinos ( b | t )with a PS script to get me started. I tweaked that a bit to use OPENBULK and OPENJSON commands to parse out the fields I wanted from the JSON files. That went well until I realized I also needed the Date of the file in order to have any useful analysis because Slack Timestamps don’t contain the date, just the time. I also realized that OPENJSON and the “WITH” command require you to use the field’s case as defined in JSON. After a bunch of playing with the PS and SQL code, I tweaked Constantine’s code to something that worked and let me pull in the message text for word analysis.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# lets just bucket all the comments from all users and then print it out | |
# for each file, read the file sequentially and add it to a hash table count | |
$database = "SQLSlack" | |
$server = "localhost" | |
$parent_folder = "C:\SlackExport" | |
Import-Module SqlServer | |
function Invoke-Sql { | |
param ($string) | |
if ($null -ne $string.ToString() -and $string.ToString().Length -gt 5) # I have no memory why I did this | |
{ | |
$string.Insert(0,$prefix) | out-null | |
$s = $string.ToString() | |
$s = $s.Substring(0,$s.Length-1) | |
Write-Debug $s | |
Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $s | Out-Null | |
} | |
} | |
$ErrorActionPreference = "Stop" | |
foreach($folder in (Get-ChildItem $parent_folder)) | |
{ | |
foreach($jsonFile in (Get-ChildItem $folder.FullName)) | |
{ | |
$string = @" | |
DECLARE @JSON NVARCHAR(max), @Channel NVARCHAR(255), @Date DATE | |
SELECT @Channel = '$($Folder.Name)', @JSON = BulkColumn, @Date = '$($($jsonFile.name).replace(".json", "").replace("-", ""))' | |
FROM OPENROWSET(BULK '$($jsonFile.FullName)', SINGLE_CLOB) AS oro | |
INSERT dbo.Posts ( Channel, Username, Type, Subtype, TS, MessageText, MessageDate ) | |
SELECT @Channel, [user], type, subtype, ts, text, DATEADD(MILLISECOND, CAST(TS AS NUMERIC(19,0)) % (1000*60*60*24), @Date) | |
FROM OPENJSON(@JSON) | |
WITH ([user] nvarchar(255), type nvarchar(255), subtype nvarchar(255), ts nvarchar(255), text nvarchar(max)) AS oj | |
"@ | |
Invoke-Sql -string $string | |
} | |
} |
Massaging the Data
It’s all well to get this into a table, but at that point, I realized all of the times were still in the Slack TimeStamp format and the Message Text used Slack’s UserID anytime someone was mentioned instead of their username. That makes sense as usernames can change or even be duplicated within a group, but made for word clouds that looked a little strange.
The following script creates the objects I used to store the Slack data, creates a full text index on the MessageText column to make finding user ids faster, imports some basic User info, and has the script to update the userids after the messages are imported.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
–Create table to store posts | |
CREATE TABLE [dbo].[Posts]( | |
[Username] [nvarchar](255) NULL, | |
[Type] [nvarchar](255) NULL, | |
[Subtype] [nvarchar](255) NULL, | |
[TS] [nvarchar](255) NULL, | |
[Channel] [nvarchar](255) NOT NULL, | |
[MessageText] [nvarchar](max) NULL, | |
[MessageDate] [datetime2](7) NULL, | |
[PostID] [bigint] IDENTITY(1,1) NOT NULL, | |
CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED | |
( | |
[PostID] ASC | |
) | |
) | |
GO | |
–Create table to store users for more meaningful text analysis | |
CREATE TABLE Users(UserID NVARCHAR(50), UserName NVARCHAR(100), RealName NVARCHAR(255)) | |
GO | |
–Import the users separately | |
–expects the users.json in the given folder | |
DECLARE @JSON NVARCHAR(max) | |
SELECT @JSON = BulkColumn | |
FROM OPENROWSET(BULK 'C:\SlackExport\users.json', SINGLE_CLOB) AS oro | |
INSERT dbo.Users ( UserID, UserName, RealName ) | |
SELECT DISTINCT id, name, real_name | |
FROM OPENJSON(@JSON) | |
WITH (id nvarchar(50), name nvarchar(100), real_name nvarchar(255)) AS oj | |
–Create a Full-Text index on the MessageText table to allow for faster update to replace user id w/ user name | |
CREATE FULLTEXT CATALOG [Messages] | |
WITH ACCENT_SENSITIVITY = ON | |
GO | |
CREATE FULLTEXT INDEX ON [dbo].[Posts] ( | |
MessageText Language 1033 | |
) | |
KEY INDEX PK_Posts | |
ON [Messages] | |
WITH CHANGE_TRACKING AUTO | |
GO | |
/* | |
Populate the Posts table using the Powershell script | |
*/ | |
–Run the following after populating the Posts table to replace the Slack UserID w/ the User Name | |
DECLARE UserCursor CURSOR FAST_FORWARD READ_ONLY | |
FOR | |
SELECT UserId, UserName | |
FROM dbo.Users AS u | |
DECLARE @UserID NVARCHAR(50), @UserName NVARCHAR(100) | |
OPEN UserCursor | |
FETCH NEXT FROM UserCursor INTO @UserID, @UserName | |
WHILE @@FETCH_STATUS = 0 | |
BEGIN | |
UPDATE Posts SET MessageText = REPLACE(MessageText, @UserID, @UserName) | |
WHERE CONTAINS(MessageText, @UserID) | |
FETCH NEXT FROM UserCursor INTO @UserID, @UserName | |
END | |
CLOSE UserCursor | |
DEALLOCATE UserCursor | |
Slack Usage Analysis
Now that this was all imported and somewhat cleansed, I was ready to do some basic analysis. I started a new PowerBI Desktop file, imported the data in the Messages table, and threw up some visualizations. I had to install the Word Cloud extension from the Marketplace, then exclude several noise words beyond the defaults. This included words such as “joined” and “channel”, ““`” (three backticks – used to denote a text block in Slack), and a handful of integers that occur regularly. I left the usernames intact as they only show when someone is mentioned directly in the channel.
Visualization isn’t my strong suit, but I’ve obtained permission from the admins to share the files used for this in case someone wants to play around with the data directly. You can download the files here. Have fun playing around and let me know what you think.
Where’s OpenBulk in the ps script?
The openrowset command in the first block of SQL in that PS script pulls the contents of the json file into the variable to be consumed by the openjson command.
Great stuff…and at this point, way over my head 😉 At least all things outside of Power BI.
So I’ve been asked to do something similar…using an existing Power BI report I’ve built that monitors/reports on our Office365 licenses, I’ve been asked to also include Slack.
Rather than a CSV export, I was trying to find a way to use the Slack API to pull the user information needed. Essentially my managers want to see all our users and which apps they’ve been assigned and are actually using, costs associated, etc. I’ve set up additional filters for their use as well (Departments, software app, etc).
This is a bit over my head as my experience with SQL is minimal and even more so with JSON. I simply needed all of our users (with their emails) to determine who’s using it. The extra work you did with word clouds I think they’d like to see also, but down the road.
Any additional direction to help get me get up to speed is appreciated…thanks!
I had to do a bit of digging and asking for help with my particular case. There should be ways to pull in the data directly from Slack with their JSON files, but I’d recommend looking at the Slack forums for more help there. The first part of this post dealt with importing the raw data into SQL tables. That might give you enough to analyze the users. Do a check for where a user id is in the comments and you should know if someone is using the system to post or not.