Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

redshiftdata/statement state not read correctly #5076

Open
MeTimesThree opened this issue Jan 10, 2025 · 3 comments
Open

redshiftdata/statement state not read correctly #5076

MeTimesThree opened this issue Jan 10, 2025 · 3 comments
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec

Comments

@MeTimesThree
Copy link

Describe what happened

When we execute a redshiftdata-statement performing a copy, the iam_role of the command seems to dissapear on the next pulumi up (without changing anything). However, the statefile is correct.

Correct Statement:
copy dwh_poc_db.public.Kundenstamm from 's3://data-poc-dp/kernbank/Kundenstammdaten_fixed.csv' iam_role 'arn:aws:iam::381492292231:role/Redshift-poc-dpServRole' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';

Statement from pulumi up:
copy dwh_poc_db.public.Kundenstamm from 's3://data-poc-dp/kernbank/Kundenstammdaten_fixed.csv' iam_role '' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';

In the log we see both occurrences, the correct one as oldInputs and the wrong one as oldOutputs:

I0110 13:32:45.560208 47033 rpc.go:77] Marshaling property for RPC[Provider[aws, 0xc002927680].Diff(urn:pulumi:datenkatalog::datenkatalog::aws:redshiftdata/statement:Statement::redshiftkernbank_init,59d15a64-f274-47df-8f27-b3560c02319a).oldInputs]: sql={copy dwh_poc_db.public.Kundenstamm from 's3://data-poc-dp/kernbank/Kundenstammdaten_fixed.csv' iam_role 'arn:aws:iam::381492292231:role/Redshift-poc-dpServRole' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';}
I0110 13:32:45.560254 47033 rpc.go:77] Marshaling property for RPC[Provider[aws, 0xc002927680].Diff(urn:pulumi:datenkatalog::datenkatalog::aws:redshiftdata/statement:Statement::redshiftkernbank_init,59d15a64-f274-47df-8f27-b3560c02319a).oldOutputs]: sql={copy dwh_poc_db.public.Kundenstamm from 's3://data-poc-dp/kernbank/Kundenstammdaten_fixed.csv' iam_role '' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';}

Sample program

The file to load into redshift:
Kontostammdaten.csv

The main.py
main.py

A needed helper-class:
sft_security_group.py

All three files in the same folder should get you going.

Log output

pulumi.log

Affected Resource(s)

redshiftdata/statement: urn:pulumi:datenkatalog::datenkatalog::aws:redshiftdata/statement:Statement::redshiftkernbank_init

Output of pulumi about

Logged in to fedora.fritz.box as u000451 (file://~)
CLI
Version 3.145.0
Go Version go1.23.4
Go Compiler gc

Plugins
KIND NAME VERSION
resource aws 6.66.2
resource aws-native 1.22.0
resource command 1.0.1
language python 3.145.0
resource std 1.6.2
resource str 1.0.0

Host
OS fedora
Version 40
Arch x86_64

This project is written in python: executable='/home/u000451/repos/sft-bi-poc/pulumi/datenkatalog/venv/bin/python' version='3.12.8'

Current Stack: organization/datenkatalog/datenkatalog

TYPE URN
pulumi:pulumi:Stack urn:pulumi:datenkatalog::datenkatalog::pulumi:pulumi:Stack::datenkatalog-datenkatalog
pulumi:providers:aws urn:pulumi:datenkatalog::datenkatalog::pulumi:providers:aws::default_6_66_2
pulumi:providers:std urn:pulumi:datenkatalog::datenkatalog::pulumi:providers:std::default_1_6_2
aws:ec2/vpc:Vpc urn:pulumi:datenkatalog::datenkatalog::aws:ec2/vpc:Vpc::vpc-poc-dp
aws:ec2/eip:Eip urn:pulumi:datenkatalog::datenkatalog::aws:ec2/eip:Eip::public_IP-poc-dp
aws:s3/bucketV2:BucketV2 urn:pulumi:datenkatalog::datenkatalog::aws:s3/bucketV2:BucketV2::testdaten
aws:iam/role:Role urn:pulumi:datenkatalog::datenkatalog::aws:iam/role:Role::Redshift-poc-dpServRole
aws:ec2/subnet:Subnet urn:pulumi:datenkatalog::datenkatalog::aws:ec2/subnet:Subnet::subnet_private_2-poc-dp
aws:ec2/subnet:Subnet urn:pulumi:datenkatalog::datenkatalog::aws:ec2/subnet:Subnet::subnet_public-poc-dp
components:index:SftSecurityGroup urn:pulumi:datenkatalog::datenkatalog::components:index:SftSecurityGroup::sftSecurityGroupRedshift
aws:ec2/subnet:Subnet urn:pulumi:datenkatalog::datenkatalog::aws:ec2/subnet:Subnet::subnet_private_1-poc-dp
aws:ec2/securityGroup:SecurityGroup urn:pulumi:datenkatalog::datenkatalog::aws:ec2/securityGroup:SecurityGroup::sftSecurityGroupRedshift-sft_security_group
aws:ec2/subnet:Subnet urn:pulumi:datenkatalog::datenkatalog::aws:ec2/subnet:Subnet::subnet_public_2-poc-dp
aws:ec2/internetGateway:InternetGateway urn:pulumi:datenkatalog::datenkatalog::aws:ec2/internetGateway:InternetGateway::InternetGateway-poc-dp
aws:s3/bucketObjectv2:BucketObjectv2 urn:pulumi:datenkatalog::datenkatalog::aws:s3/bucketObjectv2:BucketObjectv2::-poc-dp-kernbank-Kontostammdaten.csv
aws:iam/role:Role urn:pulumi:datenkatalog::datenkatalog::aws:iam/role:Role::redshift-sched-act-poc-dpServRole
aws:ec2/natGateway:NatGateway urn:pulumi:datenkatalog::datenkatalog::aws:ec2/natGateway:NatGateway::natGateway
aws:redshift/subnetGroup:SubnetGroup urn:pulumi:datenkatalog::datenkatalog::aws:redshift/subnetGroup:SubnetGroup::sub_group_redshift-poc-dp
aws:vpc/securityGroupEgressRule:SecurityGroupEgressRule urn:pulumi:datenkatalog::datenkatalog::aws:vpc/securityGroupEgressRule:SecurityGroupEgressRule::sftSecurityGroupRedshift-sft_security_group_all_outgoing
aws:vpc/securityGroupIngressRule:SecurityGroupIngressRule urn:pulumi:datenkatalog::datenkatalog::aws:vpc/securityGroupIngressRule:SecurityGroupIngressRule::sftSecurityGroupRedshift-sft_security_group_self_referincing
aws:ec2/routeTable:RouteTable urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTable:RouteTable::routeTab_public-poc-dp
aws:ec2/routeTable:RouteTable urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTable:RouteTable::routeTab_private-poc-dp
aws:redshift/cluster:Cluster urn:pulumi:datenkatalog::datenkatalog::aws:redshift/cluster:Cluster::redshift_kernbank-poc-dp
aws:ec2/routeTableAssociation:RouteTableAssociation urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTableAssociation:RouteTableAssociation::rt_assoc_pub_1_net
aws:ec2/routeTableAssociation:RouteTableAssociation urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTableAssociation:RouteTableAssociation::rt_assoc_pub_2_net
aws:ec2/routeTableAssociation:RouteTableAssociation urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTableAssociation:RouteTableAssociation::rt_assoc_priv_1_net
aws:ec2/routeTableAssociation:RouteTableAssociation urn:pulumi:datenkatalog::datenkatalog::aws:ec2/routeTableAssociation:RouteTableAssociation::rt_assoc_priv_2_net
aws:ec2/vpcEndpoint:VpcEndpoint urn:pulumi:datenkatalog::datenkatalog::aws:ec2/vpcEndpoint:VpcEndpoint::s3-poc-dp
aws:redshiftdata/statement:Statement urn:pulumi:datenkatalog::datenkatalog::aws:redshiftdata/statement:Statement::redshiftkernbank_init

Found no pending operations associated with datenkatalog

Backend
Name fedora.fritz.box
URL file://~
User u000451
Organizations
Token type personal

Dependencies:
NAME VERSION
boto3 1.35.96
pandas 2.2.3
pulumi_aws 6.66.2
pulumi_aws_native 1.22.0
pulumi_command 1.0.1
pulumi_std 1.6.2
pulumi_str 1.0.0
setuptools 75.2.0
wheel 0.44.0

Pulumi locates its logs in /tmp by default

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@MeTimesThree MeTimesThree added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Jan 10, 2025
@flostadler
Copy link
Contributor

Hey @MeTimesThree, this seems to be a bug in the upstream Terraform provider: hashicorp/terraform-provider-aws#25775

Users are complaining about this exact problem in the comments of this issue. My current hunch is that the IAM role is not included in the DescribeStatement API response from the AWS service.

Could you try exporting the ID of the Redshift statement and then execute: aws redshift-data describe-statement --id ID_FROM_PULUMI_STACK
I'm curious whether AWS returns the correct sql (QueryString property).

@flostadler flostadler added awaiting-feedback Blocked on input from the author and removed needs-triage Needs attention from the triage team labels Jan 13, 2025
@MeTimesThree
Copy link
Author

Hi @flostadler ,

apologies for not looking into the terraform provider as the culprit.

You are correct, the returned statement indeed is missing the arn in the first and all subsequent calls!

`
Initial Rollout:

redshift_init_kernbank.id: "819cde20-095b-4042-b79c-c7c3003c34b5"

aws redshift-data describe-statement --id 819cde20-095b-4042-b79c-c7c3003c34b5

{
"ClusterIdentifier": "redshift-poc-dp",
"CreatedAt": "2025-01-13T17:08:20.763000+01:00",
"Database": "dwh_poc_db",
"DbUser": "admin_user",
"Duration": 46103662603,
"HasResultSet": false,
"Id": "819cde20-095b-4042-b79c-c7c3003c34b5",
"QueryString": "\n\ndrop TABLE IF EXISTS "dwh_poc_db"."public".Kontostamm;\n\nCREATE TABLE "dwh_poc_db"."public".Kontostamm (\n Kunden_ID bigint NOT NULL,\n Konto_ID bigint NOT NULL,\n Produkt varchar(150) NOT NULL ) ;\n\ncopy "dwh_poc_db"."public".Kontostamm from 's3://data-poc-dp/kernbank/Kontostammdaten.csv' iam_role '' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';\n\n ",
"RedshiftPid": 1073823971,
"RedshiftQueryId": 1554,
"ResultRows": 0,
"ResultSize": 0,
"Status": "FINISHED",
"UpdatedAt": "2025-01-13T17:09:14.255000+01:00"
}

First refresh:
redshift_init_kernbank.id: "819cde20-095b-4042-b79c-c7c3003c34b5" => "c51b96ff-478b-4297-a1da-ad76152010f6"

aws redshift-data describe-statement --id c51b96ff-478b-4297-a1da-ad76152010f6

{
"ClusterIdentifier": "redshift-poc-dp",
"CreatedAt": "2025-01-13T17:10:54.839000+01:00",
"Database": "dwh_poc_db",
"DbUser": "admin_user",
"Duration": 485662178,
"HasResultSet": false,
"Id": "c51b96ff-478b-4297-a1da-ad76152010f6",
"QueryString": "\n\ndrop TABLE IF EXISTS "dwh_poc_db"."public".Kontostamm;\n\nCREATE TABLE "dwh_poc_db"."public".Kontostamm (\n Kunden_ID bigint NOT NULL,\n Konto_ID bigint NOT NULL,\n Produkt varchar(150) NOT NULL ) ;\n\ncopy "dwh_poc_db"."public".Kontostamm from 's3://data-poc-dp/kernbank/Kontostammdaten.csv' iam_role '' delimiter ';' IGNOREHEADER 1 DATEFORMAT AS 'DD.MM.YYYY';\n\n ",
"RedshiftPid": 1073742041,
"RedshiftQueryId": 1599,
"ResultRows": 0,
"ResultSize": 0,
"Status": "FINISHED",
"UpdatedAt": "2025-01-13T17:10:55.830000+01:00"
}
`

@pulumi-bot pulumi-bot added needs-triage Needs attention from the triage team and removed awaiting-feedback Blocked on input from the author labels Jan 13, 2025
@flostadler
Copy link
Contributor

Thanks a lot for confirming my suspicion!
I'm thinking that we should be able to fix this upstream by not committing the value AWS returns to state. The query of an executed statement can't change after all.

For now, you could try ignoring changes to the sql property by using the ignoreChanges resource option: https://www.pulumi.com/docs/iac/concepts/options/ignorechanges/.

@flostadler flostadler added awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). and removed needs-triage Needs attention from the triage team labels Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). kind/bug Some behavior is incorrect or out of spec
Projects
None yet
Development

No branches or pull requests

3 participants