←back to thread

36 points dataflowmapper | 1 comments | | HN request time: 0.201s | source

Hi all,

I’m curious how folks handle the prep work for data imports/uploads into systems like Salesforce, Workday, NetSuite, or really any app that uses template based import for data loading, migration, or implementation.

Specifically: - How do you manage conversions/transformations like formatting dates, getting everything aligned with the templates, mapping old codes to new ones, etc.

- Are you primarily using Excel, custom scripts, Power Query or something else?

- What are the most tedious/painful parts of this process and what have you found that works?

Really appreciate any insights and am curious to learn from everyone's experience.

1. irridiance ◴[] No.43993595[source]
I transform hundreds of tabular sources. For the cleaning / transformation, I found that a very small number of transformations is required, and that we need to review them as a team including business owners. So, I wrote a simple grammar that is very English-like; that gets translated into Polars operations under the covers in Python. It covers 98% + of my ingestion needs, and means that we focus on the needs of the logical data transformations as a team. Business users can easily make changes for sources they manage.

One of the concepts is a “map”, for old values to new values. Those we keep in Excel in Git, so that business users can edit / maintain them. Being Excel, we’re careful to validate the import of those rules when we do a run, mainly to indicate where there’s been a lot of change to identify where there might be an unintended change. Excel makes me nervous in data processing work in general (exploration with Pivots is great, though I’ve moved to Visidata as my first tool of choice). But for years of running in this way we’ve worked around Excel lax approach to data, such as interpreting numerical ID fields as numbers rather than strings.

For output “rendering”, because everything is in Polars, we can most frequently simply output to CSV. We use Jinja for some funky cases.