Skip to content

Uploading

Scott Kirkland edited this page Mar 11, 2021 · 6 revisions

Uploading Results

Once we have processed CSV files, we would like to upload the CSV data to our production database (PostgreSQL).

Assumption: Each CSV file is for a given county and year. ex: Lassen_2020.csv.

The following python3 script will extract the county & year from the given csv file and check our destination database to ensure that matching data has not already been uploaded. It will then proceed to use the psycopg2 library to call the pg COPY command to efficiently transfer data.

import psycopg2
import sys

if len(sys.argv) != 2:
    print('Useage: python script.py processed-file.csv')
    sys.exit()

path = sys.argv[1]

# first, let's read our file to determine which year and county we are dealing with
f = open(path, 'r')
# throw away header
f.readline()

row1 = f.readline()
cols = row1.split(',')

year = cols[2]
county = cols[13]

f.close()

#connect to the database
conn = psycopg2.connect(host='169.237.124.86',
                       dbname='cecdss',
                       user='cecdss',
                       password='uRB4vOfFJjNQ1y4cNBCDb8D1',
                       port='5432')  
#create a cursor object 
#cursor object is used to interact with the database
cur = conn.cursor()

# now let's check that we don't already have a matching record in the db
existance_sql = "select year, county_name from treatedclustersfarm where year=%s and county_name=%s limit 1;"

cur.execute(existance_sql, (year,county))
results = cur.fetchone()

if results:
    print(f'There is already data present for {county} {year}. Exiting without making changes.')
else:
    print(f'Beginning data copy for {county} {year}.')
    copy_sql = "COPY treatedclustersfarm FROM STDIN DELIMITER ',' CSV HEADER"

    with open(path, 'r') as f:
        cur.copy_expert(copy_sql, f)
        conn.commit()
    print('Copy complete')
    
cur.close()
conn.close()
Clone this wiki locally