Powershell: Writing Text Files
I was recently trying to generate a bunch of data that I needed to pass through one of our in-house DLLs in order to use some custom encryption algorithms. Thanks to the wonderful Powershell community, I found several examples on using custom DLL methods. However, I ultimately wanted this data to end up in SQL Server. My first attempt used an “Invoke-SQLCmd” Cmdlet from the SQLPS snap-in. However, generating several million rows took a while to insert. I knew that I didn’t want to queue all of this data in memory, so was inserting one row at a time. I stopped this process after a while because it was too time-consuming.
My next attempt was to push these values to a text file using a variable pointing to a file and then using add-content to write each line to the file. This took longer than doing the SQL inserts one at a time.
I remembered coming across an article by Linchi Shea titled Add-Content and Out-File are not for performance. After finding the link to that article, I found that he recommends using System.IO.StreamWriter to write out files in the most performant manner. I definitely recommend that you check out his article for his summary and examples. For my part, I can say that it took me the better part of 6 hours to generate and populate 10 million rows. I just generated 20 million into a file in < 15 minutes!
I’m including a mockup of my code below in case anyone is interested. I’m definitely not a Powershell expert so as always, be careful if you copy any of this code. My first attempt at directly inserting into SQL Server worked, but threw an error at about 3.5 million rows. As the time was already pretty lengthy by then, I stopped the process and moved on to writing the text file, resuming where I’d left off. That ran for hours until I finally killed it and looked for a better way. The uncommented code is what finally ended up working in a timely manner.
I started by loading my DLL. I called it’s “EncryptValue” function (against just the integer – that’s a longer story) and populated both out to a table. There are several reasons for doing this, but this gave me a good working set for a test project and definitely lays some groundwork for the future. If anyone has any ideas on how I can speed this up, feel free to leave me some comments. If I find some, I’ll post an update.
[Reflection.Assembly]::LoadFile("C:\MyDLL.dll")
# First attempt – insert directly into SQL Server.
#foreach ($i IN 1..10000000) { invoke-sqlcmd –query '
#("insert dbo.MyTable VALUES(" + "$i" + ",'" + [MyDLLNamespace]::EncryptValue("$i") + "')") -serverinstance "localhost" }
#Attempt #2 – Write to file with add-content
#$file = New-Item -type File "C:\Values.txt"
#foreach ($i IN 1..10000000) { add-content $file ("$i" + "," + [MyDLLNamespace]::EncryptValue("$i") ) }
$file = New-Object System.IO.StreamWriter "E:\Values.txt";
foreach ($i IN 10000001..30000000) { $file.Writeline("$i" + "," + [MyDLLNamespace]::EncryptValue("$i") ) }
$file.close();