by Klaus Graefensteiner
24. June 2010 13:31
While I was working on my 50-States blog post I needed a way to convert a CSV file to an XML file. I was hoping that Microsoft’s Excel 2010 would do that, but I couldn’t figure out how. This feature is not discoverable and the help still seems to be worked on. It is completely useless.
PowerShell to the rescue:
Figure 1: One Ring or PowerShell to rule them all
The Script
The scripts assumes that the first line in the CSV file contains the header names of the columns and re-uses the names as attributes. The conversion script needs the names for a single entity and the collection as additional parameters besides the CSV file name. One feature I needed to add was to recombine fields that had a comma in the field name and where enclosed by quotation marks. My script will take care of this situation.
Warning: The script can be used as a good example for a bad coding style.
Set-DebugMode
$CSVPath = "C:\Users\Klaus\SVN\PS\DownloadWeatherHistory\States.csv"
if(Test-Path -PathType leaf -Path $CSVPath)
{
$CSVLines = Get-content -Path $CSVPath -Encoding "UTF8"
}
else
{
Write-Error "File $CSVPath not found"
}
$FieldNames = New-Object -TypeName "System.Collections.ArrayList"
$FieldValues = New-Object -TypeName "System.Collections.ArrayList"
if ($CSVLines -ne $null -and $CSVLines.count -gt1)
{
for($i = 0; $i -lt $CSVLines.count; $i++)
{
$values = New-Object -TypeName "System.Collections.ArrayList"
$fields = ($CSVLines[$i]).Split(",");
Write-Debug "$fields"
$recombining = $false
$combinedValue = [String]::Empty
if($fields -ne $null -and $fields.count -gt 0)
{
for($f = 0; $f -lt $fields.count; $f++)
{
$value = [String]::Empty
if($i-eq 0)
{
$slug = ($fields[$f]).replace(" ", "-");
Write-Debug $slug
$null = $FieldNames.Add($slug)
}
else
{
if(($fields[$f]).startsWith('"') -and -not ($fields[$f]).endsWith('"'))
{
$recombining = $true;
$combinedValue = [String]::Empty
}
if($recombining)
{
$combinedValue += ($fields[$f]).replace('"', '')
}
if(($fields[$f]).endsWith('"') -and -not ($fields[$f]).startsWith('"'))
{
$recombining = $false;
write-Debug $combinedValue
}
if(-not $recombining)
{
if($fields[$f] -eq $null)
{
$value = [String]::Empty
}
elseif($combinedValue -ne [String]::Empty)
{
$value = $combinedValue
$combinedValue = [String]::empty
}
else
{
$value = ($fields[$f]).trim()
}
$null = $values.Add($value)
}
}
}
}
else
{
Write-Error "Insufficient row format"
}
if($values.count -gt 0)
{
$null = $FieldValues.Add($values);
}
}
}
else
{
Write-Error "Insufficient csv file format"
}
$FieldNames
$FieldValues.GetType().FullName
$FieldValues.Count
function Write-XML([System.Collections.ArrayList] $Names, [System.Collections.ArrayList] $ValueSets, [Switch] $AsElements, [String] $CollectionName, [String] $EntityName, [String] $Path)
{
$Lines = New-Object -TypeName "System.Collections.ArrayList"
$Lines.Add('<?xml version="1.0" encoding="UTF-8" standalone="yes"?>') | out-null
$Lines.Add("<$CollectionName>")
#Foreach item in values
for( $i = 0; $i -lt $ValueSets.count; $i++)
{
$ValueSet = $ValueSets[$i]
$Line = " <$EntityName ";
if($ValueSet.Count -ne $Names.Count)
{
Write-Error "Datasets are inconsistent"
}
else
{
for($u=0; $u -lt $Names.Count; $u++)
{
$Line += $Names[$u] + '="' + $ValueSet[$u] + '" '
}
}
$Line = $Line + "/>`n";
$Lines.Add($Line) | out-null
$Line = [String]::Empty
}
$Lines.Add("</$CollectionName>")
$Lines | set-content -Encoding "UTF8" -Path $Path
}
Write-XML -Names $FieldNames -Values $FieldValues -CollectionName "states" -EntityName "state" -Path "C:\Users\Klaus\SVN\PS\DownloadWeatherHistory\States.xml"
#CALIFORNIA
#CA
#Sacramento
#Los Angeles
#"36
#961
#664"
#"163
#707"
#PT (UTC-8)
#YES
#<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
#<states>
# <state name="ALABAMA" abbreviation="AL"/>
# <state name="ALASKA" abbreviation="AK"/>
# <state name="WISCONSIN" abbreviation="WI"/>
# <state name="WYOMING" abbreviation="WY"/>
#</states>
Download
The script can be downloaded here: Convert-CsvToXML.zip
Ausblick
You gotta love dynamic languages. PowerShell makes you extremely productive when working with the Windows platform. I hope it will be an object oriented language some day.