-
Notifications
You must be signed in to change notification settings - Fork 0
Uploading
Once we have processed CSV files, we would like to upload the CSV data to our production database (PostgreSQL).
Assumption: Each CSV file is for a given county and year. ex: Lassen_2020.csv.
The following python3 script will extract the county & year from the given csv file and check our destination database to ensure that matching data has not already been uploaded. It will then proceed to use the psycopg2 library to call the pg COPY command to efficiently transfer data. Finally, it will copy the provided file to a folder indicating the process is complete.
import psycopg2
import sys
import shutil
if len(sys.argv) != 2:
print('Useage: python script.py processed-file.csv')
sys.exit()
path = sys.argv[1]
# first, let's read our file to determine which year and county we are dealing with
f = open(path, 'r')
# throw away header
f.readline()
row1 = f.readline()
cols = row1.split(',')
year = cols[2]
county = cols[13]
f.close()
#connect to the database
conn = psycopg2.connect(host='hostname',
dbname='cecdss',
user='cecdss',
password='pass',
port='5432')
#create a cursor object
#cursor object is used to interact with the database
cur = conn.cursor()
# now let's check that we don't already have a matching record in the db
existance_sql = "select year, county_name from treatedclustersfarm where year=%s and county_name=%s limit 1;"
cur.execute(existance_sql, (year,county))
results = cur.fetchone()
if results:
print(f'There is already data present for {county} {year}. Exiting without making changes.')
else:
print(f'Beginning data copy for {county} {year}.')
copy_sql = "COPY treatedclustersfarm FROM STDIN DELIMITER ',' CSV HEADER"
with open(path, 'r') as f:
cur.copy_expert(copy_sql, f)
conn.commit()
print('Copy complete')
cur.close()
conn.close()
print('Data upload completed, moving file to completed directory')
shutil.move(path, f'/home/postit/pixels/uploaded/{county}-{year}.csv')
print('Moving complete, exiting')
Should be able to use *.csv
instead of file name to upload all CSV files.
curl -1 -v --disable-epsv --ftp-skip-pasv-ip -u [email protected] --ftp-ssl --upload-file filename.csv "ftps://ftp.box.com:990/CEC DSS Project/Project Tasks/Task 2 - Spatial Analysis to Locate the Woody Feedstock/Sierra_Processed_Clusters/"
Alternately, you can FTP in ftp ftp.box.com
and then type passive on
and optionally prompt
(to remove confirmation prompts). Then you can use mput *.csv
to upload.