Fix BigQuery field description limits for input layer#1323
Open
seamus-mckinsey wants to merge 3 commits intotuva-health:mainfrom
Open
Fix BigQuery field description limits for input layer#1323seamus-mckinsey wants to merge 3 commits intotuva-health:mainfrom
seamus-mckinsey wants to merge 3 commits intotuva-health:mainfrom
Conversation
✅ Deploy Preview for thetuvaproject canceled.
|
b92182d to
e924e98
Compare
Collaborator
|
@seamus-mckinsey i hadn't encountered this issue before and we run bigquery on CI. but after researching a bit it sounds like this is driven by the dbt config persist_docs.columns: true. when this is set bigquery write the column descriptions to a table in the warehouse and applies the 1024 character limit. this config is false by default. i assume you're using this config and plan to in the future as well? in general i would prefer not to enforce a limit on column descriptions but if it's important to your usage then we can. |
Contributor
Author
|
@aneiderhiser thanks for looking into it. We do need to persist columns in order for documentation to show up in BigQuery and downstream systems that consume it |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Shorten all input-layer column descriptions that exceed BigQuery's 1024-character field description limit.
input_layer__pharmacy_claim.ndc_codeinput_layer__medical_claim.claim_typeinput_layer__eligibility.person_idinput_layer__medical_claim.place_of_service_codeinput_layer__medical_claim.hcpcs_codeinput_layer__medical_claim.rendering_npiinput_layer__medical_claim.billing_npiinput_layer__medical_claim.facility_npiinput_layer__medical_claim.diagnosis_code_typeinput_layer__medical_claim.procedure_code_typeThe edits keep the operational guidance while bringing every affected description under BigQuery's cap.
Why
BigQuery rejects schema updates when a field description is longer than 1024 characters. Current Tuva input-layer YAML contains multiple package descriptions over that limit, which breaks users running these models on BigQuery even when their local project YAML is fine, because the overlong metadata comes from the package itself.
Validation
Confirmed the repo now has zero YAML
description:values above BigQuery's 1024-character limitFollow-up
#1324 adds a github action to check descriptions to prevent a regression on this issue in the future