What happens?
Hi,
I'm running
splink 4.0.16 on
a Sagemaker Studio Jypterlab instance on AWS
I am able to connect to Athena via the API. However, I cannot connect to the database (that exists!) that I prefer to use. This is the relevant snippet:
import boto3
from splink.backends.athena import AthenaAPI
REGION = "us-west-2"
S3_OUTPUT_LOCATION = "MYS3OUTPUTLOCATION"
boto3_session = boto3.Session(region_name=REGION)
aws_filepath = S3_OUTPUT_LOCATION
db_api = AthenaAPI(
boto3_session,
output_bucket=bucket,
output_database=database,
output_filepath=filepath,
)
But upon execution, I get the following traceback
InvalidAWSBucketOrDatabase Traceback (most recent call last)
Cell In[4], line 24
22 boto3_session = boto3.Session(region_name="us-west-2")
23 aws_filepath = S3_OUTPUT_LOCATION
---> 24 db_api = AthenaAPI(
25 boto3_session,
26 output_bucket=bucket,
27 output_database=database,
28 output_filepath=filepath,
29 )
30 import numpy as np
File /opt/conda/lib/python3.12/site-packages/splink/internals/athena/database_api.py:41, in AthenaAPI.init(self, boto3_session, output_database, output_bucket, output_filepath)
37 raise ValueError("Please enter a valid boto3 session object.")
39 self.sql_dialect = "presto"
---> 41 _verify_athena_inputs(output_database, output_bucket, boto3_session)
42 self.boto3_session = boto3_session
43 self.output_schema = output_database
File /opt/conda/lib/python3.12/site-packages/splink/internals/athena/athena_helpers/athena_utils.py:31, in _verify_athena_inputs(database, bucket, boto3_session)
29 database_bucket_txt = " and ".join(errors)
30 do_does_grammar = ["does", "it"] if len(errors) == 1 else ["do", "them"]
---> 31 raise InvalidAWSBucketOrDatabase(
32 athena_warning_text(database_bucket_txt, do_does_grammar)
33 )
InvalidAWSBucketOrDatabase:
The supplied database '[database]' that you have requested to write to does not currently exist.
Create it either directly from within AWS, or by using 'awswrangler.athena.create_athena_bucket' for buckets or 'awswrangler.catalog.create_database' for databases using the awswrangler API.
It looks like the code checks for the db availability here
When I manually run
wr.catalog.databases(boto3_session=boto3_session).values
I see 100 dbs, but not the one I want. When I change it to
wr.catalog.databases(limit=200,boto3_session=boto3_session).values
I do see the db. So there appears to be a bug in the code that checks for the database.
To Reproduce
See in the description
OS:
AWS Sagemaker studio instance
Splink version:
4.0.16
Have you tried this on the latest master branch?
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
What happens?
Hi,
I'm running
splink 4.0.16 on
a Sagemaker Studio Jypterlab instance on AWS
I am able to connect to Athena via the API. However, I cannot connect to the database (that exists!) that I prefer to use. This is the relevant snippet:
import boto3
from splink.backends.athena import AthenaAPI
REGION = "us-west-2"
S3_OUTPUT_LOCATION = "MYS3OUTPUTLOCATION"
boto3_session = boto3.Session(region_name=REGION)
aws_filepath = S3_OUTPUT_LOCATION
db_api = AthenaAPI(
boto3_session,
output_bucket=bucket,
output_database=database,
output_filepath=filepath,
)
But upon execution, I get the following traceback
InvalidAWSBucketOrDatabase Traceback (most recent call last)
Cell In[4], line 24
22 boto3_session = boto3.Session(region_name="us-west-2")
23 aws_filepath = S3_OUTPUT_LOCATION
---> 24 db_api = AthenaAPI(
25 boto3_session,
26 output_bucket=bucket,
27 output_database=database,
28 output_filepath=filepath,
29 )
30 import numpy as np
File /opt/conda/lib/python3.12/site-packages/splink/internals/athena/database_api.py:41, in AthenaAPI.init(self, boto3_session, output_database, output_bucket, output_filepath)
37 raise ValueError("Please enter a valid boto3 session object.")
39 self.sql_dialect = "presto"
---> 41 _verify_athena_inputs(output_database, output_bucket, boto3_session)
42 self.boto3_session = boto3_session
43 self.output_schema = output_database
File /opt/conda/lib/python3.12/site-packages/splink/internals/athena/athena_helpers/athena_utils.py:31, in _verify_athena_inputs(database, bucket, boto3_session)
29 database_bucket_txt = " and ".join(errors)
30 do_does_grammar = ["does", "it"] if len(errors) == 1 else ["do", "them"]
---> 31 raise InvalidAWSBucketOrDatabase(
32 athena_warning_text(database_bucket_txt, do_does_grammar)
33 )
InvalidAWSBucketOrDatabase:
The supplied database '[database]' that you have requested to write to does not currently exist.
Create it either directly from within AWS, or by using 'awswrangler.athena.create_athena_bucket' for buckets or 'awswrangler.catalog.create_database' for databases using the awswrangler API.
It looks like the code checks for the db availability here
When I manually run
wr.catalog.databases(boto3_session=boto3_session).values
I see 100 dbs, but not the one I want. When I change it to
wr.catalog.databases(limit=200,boto3_session=boto3_session).values
I do see the db. So there appears to be a bug in the code that checks for the database.
To Reproduce
See in the description
OS:
AWS Sagemaker studio instance
Splink version:
4.0.16
Have you tried this on the latest
masterbranch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?