Skip to content

Fix BigQuery field description limits for input layer#1323

Open
seamus-mckinsey wants to merge 3 commits intotuva-health:mainfrom
seamus-mckinsey:fix/bigquery-input-descriptions
Open

Fix BigQuery field description limits for input layer#1323
seamus-mckinsey wants to merge 3 commits intotuva-health:mainfrom
seamus-mckinsey:fix/bigquery-input-descriptions

Conversation

@seamus-mckinsey
Copy link
Copy Markdown
Contributor

@seamus-mckinsey seamus-mckinsey commented Apr 24, 2026

Summary

Shorten all input-layer column descriptions that exceed BigQuery's 1024-character field description limit.

  • input_layer__pharmacy_claim.ndc_code
  • input_layer__medical_claim.claim_type
  • input_layer__eligibility.person_id
  • input_layer__medical_claim.place_of_service_code
  • input_layer__medical_claim.hcpcs_code
  • input_layer__medical_claim.rendering_npi
  • input_layer__medical_claim.billing_npi
  • input_layer__medical_claim.facility_npi
  • input_layer__medical_claim.diagnosis_code_type
  • input_layer__medical_claim.procedure_code_type

The edits keep the operational guidance while bringing every affected description under BigQuery's cap.

Why

BigQuery rejects schema updates when a field description is longer than 1024 characters. Current Tuva input-layer YAML contains multiple package descriptions over that limit, which breaks users running these models on BigQuery even when their local project YAML is fine, because the overlong metadata comes from the package itself.

Validation

Confirmed the repo now has zero YAML description: values above BigQuery's 1024-character limit

Follow-up

#1324 adds a github action to check descriptions to prevent a regression on this issue in the future

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 24, 2026

Deploy Preview for thetuvaproject canceled.

Name Link
🔨 Latest commit fb451c1
🔍 Latest deploy log https://app.netlify.com/projects/thetuvaproject/deploys/69f2197e8062ca000807dcc0

@seamus-mckinsey seamus-mckinsey force-pushed the fix/bigquery-input-descriptions branch from b92182d to e924e98 Compare April 27, 2026 14:47
@aneiderhiser
Copy link
Copy Markdown
Collaborator

@seamus-mckinsey i hadn't encountered this issue before and we run bigquery on CI. but after researching a bit it sounds like this is driven by the dbt config persist_docs.columns: true. when this is set bigquery write the column descriptions to a table in the warehouse and applies the 1024 character limit. this config is false by default.

i assume you're using this config and plan to in the future as well? in general i would prefer not to enforce a limit on column descriptions but if it's important to your usage then we can.

@seamus-mckinsey
Copy link
Copy Markdown
Contributor Author

@aneiderhiser thanks for looking into it. We do need to persist columns in order for documentation to show up in BigQuery and downstream systems that consume it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 👀 Ready for Review

Development

Successfully merging this pull request may close these issues.

2 participants