Analyzing Slack Traffic with PowerBI

The other day in the SQL Community Slack channels, we started chatting about general usage of Slack – which channels saw the most activity, what topics were discussed, and so on. One thing led to another and I wondered if I could put that into a report for analysis. While I realized that you can’t get any sort of “top user” based on the extract I used, it was still an interesting exercise.

I went through a couple of steps to get to a usable set of data:

  1. Import the Slack JSON data
  2. Massage it
  3. Replace the user ids with usernames – mostly for better word clouds
  4. Import the data into PowerBI and play with some visualizations

Here are the final results for those who might be curious:

Importing the Data

This was one of the trickier parts, but I got some help from Constantine Kokkinos ( b | t )with a PS script to get me started. I tweaked that a bit to use OPENBULK and OPENJSON commands to parse out the fields I wanted from the JSON files. That went well until I realized I also needed the Date of the file in order to have any useful analysis because Slack Timestamps don’t contain the date, just the time. I also realized that OPENJSON and the “WITH” command require you to use the field’s case as defined in JSON. After a bunch of playing with the PS and SQL code, I tweaked Constantine’s code to something that worked and let me pull in the message text for word analysis.


# lets just bucket all the comments from all users and then print it out
# for each file, read the file sequentially and add it to a hash table count
$database = "SQLSlack"
$server = "localhost"
$parent_folder = "C:\SlackExport"
Import-Module SqlServer
function Invoke-Sql {
param ($string)
if ($null -ne $string.ToString() -and $string.ToString().Length -gt 5) # I have no memory why I did this
{
$string.Insert(0,$prefix) | out-null
$s = $string.ToString()
$s = $s.Substring(0,$s.Length-1)
Write-Debug $s
Invoke-Sqlcmd -ServerInstance $server -Database $database -Query $s | Out-Null
}
}
$ErrorActionPreference = "Stop"
foreach($folder in (Get-ChildItem $parent_folder))
{
foreach($jsonFile in (Get-ChildItem $folder.FullName))
{
$string = @"
DECLARE @JSON NVARCHAR(max), @Channel NVARCHAR(255), @Date DATE
SELECT @Channel = '$($Folder.Name)', @JSON = BulkColumn, @Date = '$($($jsonFile.name).replace(".json", "").replace("-", ""))'
FROM OPENROWSET(BULK '$($jsonFile.FullName)', SINGLE_CLOB) AS oro
INSERT dbo.Posts ( Channel, Username, Type, Subtype, TS, MessageText, MessageDate )
SELECT @Channel, [user], type, subtype, ts, text, DATEADD(MILLISECOND, CAST(TS AS NUMERIC(19,0)) % (1000*60*60*24), @Date)
FROM OPENJSON(@JSON)
WITH ([user] nvarchar(255), type nvarchar(255), subtype nvarchar(255), ts nvarchar(255), text nvarchar(max)) AS oj
"@
Invoke-Sql -string $string
}
}

view raw

ImportSlack.ps1

hosted with ❤ by GitHub

 

Massaging the Data

It’s all well to get this into a table, but at that point, I realized all of the times were still in the Slack TimeStamp format and the Message Text used Slack’s UserID anytime someone was mentioned instead of their username. That makes sense as usernames can change or even be duplicated within a group, but made for word clouds that looked a little strange.

The following script creates the objects I used to store the Slack data, creates a full text index on the MessageText column to make finding user ids faster, imports some basic User info, and has the script to update the userids after the messages are imported.


–Create table to store posts
CREATE TABLE [dbo].[Posts](
[Username] [nvarchar](255) NULL,
[Type] [nvarchar](255) NULL,
[Subtype] [nvarchar](255) NULL,
[TS] [nvarchar](255) NULL,
[Channel] [nvarchar](255) NOT NULL,
[MessageText] [nvarchar](max) NULL,
[MessageDate] [datetime2](7) NULL,
[PostID] [bigint] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_Posts] PRIMARY KEY CLUSTERED
(
[PostID] ASC
)
)
GO
–Create table to store users for more meaningful text analysis
CREATE TABLE Users(UserID NVARCHAR(50), UserName NVARCHAR(100), RealName NVARCHAR(255))
GO
–Import the users separately
–expects the users.json in the given folder
DECLARE @JSON NVARCHAR(max)
SELECT @JSON = BulkColumn
FROM OPENROWSET(BULK 'C:\SlackExport\users.json', SINGLE_CLOB) AS oro
INSERT dbo.Users ( UserID, UserName, RealName )
SELECT DISTINCT id, name, real_name
FROM OPENJSON(@JSON)
WITH (id nvarchar(50), name nvarchar(100), real_name nvarchar(255)) AS oj
–Create a Full-Text index on the MessageText table to allow for faster update to replace user id w/ user name
CREATE FULLTEXT CATALOG [Messages]
WITH ACCENT_SENSITIVITY = ON
GO
CREATE FULLTEXT INDEX ON [dbo].[Posts] (
MessageText Language 1033
)
KEY INDEX PK_Posts
ON [Messages]
WITH CHANGE_TRACKING AUTO
GO
/*
Populate the Posts table using the Powershell script
*/
–Run the following after populating the Posts table to replace the Slack UserID w/ the User Name
DECLARE UserCursor CURSOR FAST_FORWARD READ_ONLY
FOR
SELECT UserId, UserName
FROM dbo.Users AS u
DECLARE @UserID NVARCHAR(50), @UserName NVARCHAR(100)
OPEN UserCursor
FETCH NEXT FROM UserCursor INTO @UserID, @UserName
WHILE @@FETCH_STATUS = 0
BEGIN
UPDATE Posts SET MessageText = REPLACE(MessageText, @UserID, @UserName)
WHERE CONTAINS(MessageText, @UserID)
FETCH NEXT FROM UserCursor INTO @UserID, @UserName
END
CLOSE UserCursor
DEALLOCATE UserCursor

view raw

Slack.sql

hosted with ❤ by GitHub

Slack Usage Analysis

Now that this was all imported and somewhat cleansed, I was ready to do some basic analysis. I started a new PowerBI Desktop file, imported the data in the Messages table, and threw up some visualizations. I had to install the Word Cloud extension from the Marketplace, then exclude several noise words beyond the defaults. This included words such as “joined” and “channel”, ““`” (three backticks – used to denote a text block in Slack), and a handful of integers that occur regularly.  I left the usernames intact as they only show when someone is mentioned directly in the channel.

Visualization isn’t my strong suit, but I’ve obtained permission from the admins to share the files used for this in case someone wants to play around with the data directly. You can download the files here. Have fun playing around and let me know what you think.

Comments
  1. Giorgio

    Where’s OpenBulk in the ps script?

    • paschott

      The openrowset command in the first block of SQL in that PS script pulls the contents of the json file into the variable to be consumed by the openjson command.

  2. Joshua Brock

    Great stuff…and at this point, way over my head 😉 At least all things outside of Power BI.

    So I’ve been asked to do something similar…using an existing Power BI report I’ve built that monitors/reports on our Office365 licenses, I’ve been asked to also include Slack.

    Rather than a CSV export, I was trying to find a way to use the Slack API to pull the user information needed. Essentially my managers want to see all our users and which apps they’ve been assigned and are actually using, costs associated, etc. I’ve set up additional filters for their use as well (Departments, software app, etc).

    This is a bit over my head as my experience with SQL is minimal and even more so with JSON. I simply needed all of our users (with their emails) to determine who’s using it. The extra work you did with word clouds I think they’d like to see also, but down the road.

    Any additional direction to help get me get up to speed is appreciated…thanks!

    • paschott

      I had to do a bit of digging and asking for help with my particular case. There should be ways to pull in the data directly from Slack with their JSON files, but I’d recommend looking at the Slack forums for more help there. The first part of this post dealt with importing the raw data into SQL tables. That might give you enough to analyze the users. Do a check for where a user id is in the comments and you should know if someone is using the system to post or not.

Leave a Reply

Your email address will not be published. Required fields are marked *