←back to thread

36 points dataflowmapper | 1 comments | | HN request time: 0.411s | source

Hi all,

I’m curious how folks handle the prep work for data imports/uploads into systems like Salesforce, Workday, NetSuite, or really any app that uses template based import for data loading, migration, or implementation.

Specifically: - How do you manage conversions/transformations like formatting dates, getting everything aligned with the templates, mapping old codes to new ones, etc.

- Are you primarily using Excel, custom scripts, Power Query or something else?

- What are the most tedious/painful parts of this process and what have you found that works?

Really appreciate any insights and am curious to learn from everyone's experience.

1. hyperman1 ◴[] No.43993666[source]
Some of my experiences:

Step 1 is always storing a pristine unmodified copy. Try to build a chain of steps that can always be redone starting at that copy

If you have any control over what comes in, try to make some baseline requirements:. A stable identifier for each record, a parseable number format with clarity about US vs world conventies like , vs . and a stable date format like ISO 8601. Also a real structured format like json, xml, ... works better than csv, which is better than xls(x)

From there, it depends.

If there is a lot of data and you need a quick estimate, the POSIX toolset is a good choice: cut, grep, sed, jq, ...

If there is a lot of data, reasonably structured, load it minimally parsed in an sql table as (line number,source id,line of text) and parse from there. The database will auto parallelize.

If the data is hard to parse, grab a real programming language. You probably want something that can deal with character data quickly, e.g. Go Java Rust C# .... Python or R work if the amount of data is low or you can lean heavily on things like numpy. PHP, VBA,... tend to be slow and error prone.

My experience with ETL tooling is it's eternally 90% there. It should theoretically solve this, but I always seem to fall back on programming languages and fight the tools after a while.