Tales of a Lazy DBA–MySQL, SSIS, and “0” Dates
We’ve recently been tasked with converting a lot of MySQL Data into our system from a multi-tenant MySQL Database. We don’t have a fixed schedule to actually execute the imports because they’ll be on a “per customer” basis. Well, that sounded like a great task for SSIS. I set up the ODBC driver, connected, set the initial “sql_mode” options for our connections to (‘mssql, allow_invalid_dates’) and started to work.
First problem we ran into with an ADO.NET connection to MySQL and writing a SELECT * from schema.table was when we hit a MySQL “Date” column containing a value of ‘0000-00-00’. SSIS threw an error, not sure what to do. Thanks to some others who have solved this problem, I realized that within the MySQL Select statement, we could do something like:
CASE date_created WHEN ‘0000-00-00’ THEN NULL else date_created END as date_created
to pass those dates over as NULL. That solves the implicit conversion to datetime (SQL 2005) and avoids the invalid dates. I ran something similar for a “Time” column to:
CAST(Time_Created as char(8) ) as Time_Created
So that solved one particular table export with about 20 or so CAST and CASE statements. Needless to say, I wasn’t looking forward to doing this for another 500 tables with a total of almost 6000 columns.
I finally set up a really basic query to generate most of the SELECT statements we would need to pull our MySQL data across without too much pain. Admittedly, a small step and I’d still need to copy/paste when I set up each new ADO.NET source, but it worked reasonably well. I’m adding the code snippet for MySQL here in case anyone else has a similar problem.
select
CONCAT(CASE WHEN ordinal_position = 1 THEN 'SELECT ' ELSE ', ' END,
CASE Data_Type WHEN 'date'
THEN CONCAT('CASE ',column_name,' WHEN ''0000-00-00'' THEN NULL else '
,column_name,' END as ',column_name)
WHEN 'time' THEN CONCAT('CAST(',column_name,' AS CHAR(8) ) as ',column_name)
ELSE column_name END, CASE WHEN Ordinal_Position MaxOrd THEN ''
ELSE CONCAT('
FROM MySchema.', c.table_name) END
) as Select_Column_Name
from information_schema.columns as c
JOIN (select table_name, MAX(ordinal_position) as MaxOrd
from information_schema.columns
WHERE Table_Schema = 'MySchema'
GROUP BY Table_Name) as t1
ON c.Table_Name = t1.Table_Name
where table_schema = 'MySchema'
order by c.table_name, ordinal_position LIMIT 0, 50000;
I’d love to hear other ideas if anyone has encountered this before and come up with a more elegant solution for translating “0” Date or Time data from MySQL into SQL Server.
Did you consider redirecting the invalid date records to error log?
In this case, no. We want the data to come over and that all zero date format pretty much meant “No Date” or NULL for that column.
I had to do some other scrubbing around the date column as well for bad dates, but that was usually handled in the WHERE clause to exclude them completely for certain tables. In some cases I still had to tweak this code to get around it because of birth dates with bad years or something along those lines, but most of the time I could just NULL out the date and move on.
For our purposes, the rest of the data in the row was valid, just not the numerous date columns that _could_ be used in some circumstances. Redirecting the entire row to an error flow of some sort would have resulted in a lot more manual scrubbing and processing to end up with the same results. It would be the same as if someone used '18991231' for their Unknown date. It's something that we'd catch as “remove this date because it's invalid” instead of porting in to our system.