Newbie trying to implement a data warehouse

Hi guys, I’m new here. Currently, we have a main Postgres database VM on fly.io containing live data for our website. I’ve been given the task of making a data warehouse to give access to others to make reports without impacting the production live processes. This data warehouse would be updated as close to real-time as possible with the data from the main Postgres VM.

I have the 2 following options:
A-adding a second fly.io Postgres database VM and replicating the data there
B-replicating the data into GCP ideally into Big Query directly

Here are my questions:
1-For the A option, I would like to use the logical replication and was wondering if it impacts the automated Fly.io backups when I set “wal_level must be set to logical” on the main Postgres VM so it becomes a publisher. I’m wondering if setting the replication mode to logical would impact the automatic backups from Fly.io?
2-For the B option, I’m curious if anybody already did something like that and what tools did you use? Any examples or hints are more than welcome.

1 Like

I think that my questions were vague. For the first question, I think I did find my answer. Fly.io keeps a “volume backup” each day for 7 days. To my understanding, that’s a backup of the VM.
Basically I won’t impact Fly.io process by changing the replication to logical. At least that’s my understanding. Feel free to correct me.

For the second question, I did find GCP Stream that can use logical replication and back the data into GCP BigQuery. From there I found different ways to do reporting on the data.

Any comments or hints are welcome.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.