[Architecture] Data loaders for loading data into CouchDB

Li, Cindy cli at ocadu.ca
Tue Aug 9 15:33:48 UTC 2016

Thanks for bringing up the security concern, Benjamin. I’m thinking in the real production case, data loaders probably should not be deployed at all, just like not having tests included in library release builds.

Since GPII is still under development including the production mode, these data loaders are more for loading test production server data to lay a base ground for people to play with.

The prefs data are all *.json files at https://github.com/GPII/universal/tree/master/testData/preferences/
The auth server data is at https://github.com/GPII/universal/blob/master/testData/security/TestOAuth2DataStore.js#L28-L105

Seems to me it doesn’t make sense for these data to go into a real production, in which case, all documents may start as empty and actual data may come in from certain interfaces.

I’m not sure if the use case of the real production and where its initial data come from have ever been talked about, but knowing more about that will certainly help to determine how to deal with data loaders.

On Aug 9, 2016, at 10:44 AM, Benjamin Stokes <ben.stokes at us.ibm.com<mailto:ben.stokes at us.ibm.com>> wrote:

Data loading as a feature sounds useful for the reasons mentioned but is there a use case for production? I would suggest disabling the data loaders by default and only enabling them outside of production (maybe using NODE_ENV env variable). I think in production we wouldn't want external resources to load preference or auth server data.

----- Original message -----
From: "Harnum, Alan" <aharnum at ocadu.ca<mailto:aharnum at ocadu.ca>>
Sent by: "Architecture" <architecture-bounces at lists.gpii.net<mailto:architecture-bounces at lists.gpii.net>>
To: "Li, Cindy" <cli at ocadu.ca<mailto:cli at ocadu.ca>>, "architecture at lists.gpii.net<mailto:architecture at lists.gpii.net> Architecture" <architecture at lists.gpii.net<mailto:architecture at lists.gpii.net>>
Subject: Re: [Architecture] Data loaders for loading data into CouchDB
Date: Tue, Aug 9, 2016 9:53 AM

+1. Having created the data loader container to replace a marginally more fragile shell script approach for deployment, it would be great to have a "GPII native" approach to deployment needs like loading data, etc.

From: Architecture <architecture-bounces at lists.gpii.net<mailto:architecture-bounces at lists.gpii.net>> on behalf of "Li, Cindy" <cli at ocadu.ca<mailto:cli at ocadu.ca>>
Date: Tuesday, August 9, 2016 at 9:39 AM
To: "architecture at lists.gpii.net<mailto:architecture at lists.gpii.net> Architecture" <architecture at lists.gpii.net<mailto:architecture at lists.gpii.net>>
Subject: [Architecture] Data loaders for loading data into CouchDB


I’m working on using CouchDB for the auth server data persistence when GPII is running in the production mode. Avtar, Simon and I were going thru the deployment of the preferences server VM that is already backed up using CouchDB when running in the production mode.

One thing we noticed is, currently the data loading process for importing the initial perfs data into the CouchDB, including the preprocessing of converting the prefs data into CouchDB compatible data structure, is performed by the docker image - https://github.com/gpii-ops/docker-preferences-server-data-loader

More particularly, 2 ansible roles that are called up by the docker:


This means scripts written in ansible roles need to be aware of where the data, as well as the GPII production config file, are located in the GPII universal repo, its data structure, how to convert that data structure into CouchDB sensible structure etc. All these are currently hardcoded in ansible scripts so that any change on these information could break the data loading process. One existing breakage is https://issues.gpii.net/browse/GPII-1884, which is caused by the renaming of the GPII production config file.

To solve this issue, Avtar, Simon and I would like to propose creating data loaders in the GPII universal repo. These data loaders can be called/executed by external resources, the docker image in our case, to load data into CouchDB. This would help to isolate the universal specific info within the universal repo, reduce dependencies btw the docker VM and the GPII universal, also reduce the amount of work required on the docker image.

At the moment, 2 data loaders are needed:

1. To convert and load prefs data into CouchDB for the preferences server;
2. To load data and views into CouchDB for the auth server.

Your ideas and suggestions are appreciated. We can also discuss this in tomorrow’s arch meeting.


Architecture mailing list
Architecture at lists.gpii.net<mailto:Architecture at lists.gpii.net>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.gpii.net/pipermail/architecture/attachments/20160809/4ae168b7/attachment.html>

More information about the Architecture mailing list