column names

We have data that comes in from, possibly, multiple data sources.  This data is processed and chopped into our derived tables, from which we are going to work.  The process to combine and manage the data is done by several different user stored procedures  – which are maintained by multiple people.  In order to automate what needs to be done with new data refreshes, there is a single table that holds information about what tables are officially maintained.  On refresh, there is another usp that runs against this table, and uses the stored names to run the necessary usp to create each table.

Thing can change and versions of the usps or even the structure of the table can be modified and not announced.  Of course, this problem is eased by making the central program only necessary to coordinate the sequential running of each of the usps that actually create the tables, but still there can be confusion.

Our approach is to also make each usp (and the author) responsible for undating a central table of metadata with each of the names of the table, the names of the column, and the client database from which it was derived.  This CAN be done manually.  Sure.  If we are not certain where the column comes from, we can get it from the usp code, and put it into the table.  We can also update the metadataTable each time the usp runs to make a new derived table.  Code like this will work to first, clean up the table to remove old entries, then to insert the new values:

 delete from [dbo].[metadataTables] where tableName='table1' 
insert into [dbo].[metadataTables] (tableName, fieldName, 
systemName) values
 ( 'table1','EMR_Provider_id','S1')
,( 'table1','suffix','S1')
,( 'table1','first_name','S1')
,( 'table1','middle_name','S1')
,( 'table1','last_name','S1')
,( 'table1','pm_provider_id','S2')

Of course, this can be made better in cases (well, all cases, but most usefully) when most of the columns come from one source.  Then we can put them all in, and change the ones not correct:

delete from metadataTables where tableName = 'table1'

insert into metadataTables
select 'table1', column_name, 'S1' 
from information_schema.columns 
where table_name = 'table1' order by ordinal_position

update PDS_metadataTables set systemname = 'S2' 
where tableName = 'table1' and fieldName in ('pm_provider_id')

Delete old entries, get all the column names for table1, put them into metadataTables as from SystemName S1, then from your own knowledge, update just ‘pm_provider_id’ to the correct systemName.  Repeat Step three until everything is lined up.

Advertisements