Use PowerShell to split large files by month/year for data loads into FDMEE on PBCS
If you are using PBCS, you may run into some challenges with large files being passed through FDMEE. Whether performance is an issue or you just want to parse a file my month/year, this script might save you some time.
I recently had the need to break apart a file. The source provided one large text file that included 2 years of data that was needed to populate the history of an employee metrics application. The current process loaded files by month and we wanted to be able to piggy back off the existing scripts to load and process data in FDMEE and the monthly Planning data pushes to the ASO reporting cube. So, the need break the data file into seperate files by month and year was required. The file was delimited and formatted like the following.
The goal was to have a file for every unique month and year combination that included only the lines of the relevant time periods. The header of the file also had to exist in each of the smaller files. Since we were working on a Windows machine, we used PowerShell to script the solution.
Powershell Script Directions
The script is pretty simple to use and understand. Update the script as follows.
- Create a new text file with a ps1 extension and paste the following into that file.
- Update the srcFile variable to point to the file to be parsed.
- Update the startYear to the first year in the file to be extracted.
- Update the currentYear variable to the last year in the file to be extracted.
- Update the ProcessName to a meaningful word or phrase that will be used to create the file name.
- Save the file and execute it like any other PowerShell script.
This will produce 12 files for each year with the header line and the data related to the month and year that represents the year and month in the file name.
I welcome feedback on improving performance and will give credit to anybody that can improve on this. I am NOT an expert in PowerShell and I am sure there are faster ways to accomplish this. This created 12 files (1 year / 12 months) from a file that includes 7.8 million records and completed in 24 minutes. So, this is pretty reasonable for one-off requests, but might need attention if it was a repeatable need.
This was developed using PowerShell 5 and some functions do not work in earlier adoptions of the software.
Hopefully this will benefit the community. As I create more scripts like this, I plan to share them.