diff --git a/models/input_layer/input_layer__eligibility.yml b/models/input_layer/input_layer__eligibility.yml index 98c9a67d3..de4e3ab34 100644 --- a/models/input_layer/input_layer__eligibility.yml +++ b/models/input_layer/input_layer__eligibility.yml @@ -73,7 +73,7 @@ models: materialized: view columns: - name: person_id - description: 'Unique identifier for each person in the dataset. person_id is a required (string) field that ideally contains a person-level UUID (Universally Unique Identifier), if available. This can be populated from the Tuva EMPI Engine or with your own Master Patient Index identifier. If you dont have a UUID, we recommend mapping the source patient identifier to this field (member_id for claims and patient_id for clinical). The primary key for the eligibility table is person_id, member_id, enrollment_start_date, enrollment_end_date, and data_source. There are two commonly used data formats for eligibility (also known as enrollment) data: the eligibility span format and the member month format. The eligibility span format has one record per member eligibility span. An eligibility span is a time period when a member was enrolled with and therefore had insurance coverage by a health plan. An eligibility span has a start date and an end date. A person can have multiple eligibility spans. The member month format has one record per member per month of enrollment. For example, a person with a single eligibility span from 1/1/2020 through 3/31/2020 would have a single eligibility span record, but 3 member month records, one for each month. The eligibility table follows the eligibility span format.' + description: 'Unique identifier for each person in the dataset. This required string should ideally be a person-level UUID, whether from the Tuva EMPI Engine or your own master patient index. If no UUID is available, map the source patient identifier to person_id, such as member_id in claims or patient_id in clinical data. The eligibility table follows the eligibility span format, so a person may appear on multiple rows because the table key also includes member_id, enrollment dates, and data_source, but person_id itself should remain stable across coverage periods, plans, payers, and source files for the same individual.' mapping_instructions: >- Map this field to a unique person-level identifier. The same person should not have multiple `person_id` values across payers, data diff --git a/models/input_layer/input_layer__medical_claim.yml b/models/input_layer/input_layer__medical_claim.yml index e4bc1748f..ef46d0fdf 100644 --- a/models/input_layer/input_layer__medical_claim.yml +++ b/models/input_layer/input_layer__medical_claim.yml @@ -148,7 +148,7 @@ models: data_type: integer is_primary_key: true - name: claim_type - description: 'Indicates whether the claim is professional (CMS-1500), institutional (UB-04), or undetermined. This field is a string that describes the type of claim and must have one of the following 3 values: institutional, professional, or undetermined. This is a header-level field, so its value must be the same for all lines in a given claim. This field should be populated for every row in the medical_claim table. The logic to populate this field is as follows: - A claim is said to be institutional if it has any of these fields populated: bill_type_code, drg_code, admit_type_code, admit_source_code, discharge_disposition_code, revenue_center_code. - A claim is said to be professional if none of those institutional indicators are populated and it has at least one populated place_of_service_code. - If neither of the above two bullets is the case, the claim is said to be undetermined. Making the claim_type determination at the header level might happen in a CTE that looks like this: with claim_types as ( select claim_id , max( bill_type_code is not null or drg_code is not null or admit_type_code is not null or admit_source_code is not null or discharge_disposition_code is not null or revenue_center_code is not null ) as is_institutional , max( bill_type_code is null and drg_code is null and admit_type_code is null and admit_source_code is null and discharge_disposition_code is null and revenue_center_code is null and place_of_service_code is not null ) as is_professional from mapped_claims_data group by claim_id ) Then, later, the claim type determination can be made in a case statement, like this: ... , case when is_institutional then institutional when is_professional and not is_institutional then professional when not is_professional and not is_institutional then undetermined end as claim_type' + description: 'Header-level classification of the medical claim as institutional, professional, or undetermined. The value should be consistent across all lines for the same claim_id and should be populated for every row in the medical_claim table. A claim is institutional if one or more institutional indicators are populated, including bill_type_code, drg_code, admit_type_code, admit_source_code, discharge_disposition_code, or revenue_center_code. A claim is professional if none of those institutional indicators are populated and at least one place_of_service_code is populated. If neither condition is met, map the claim as undetermined. This field is used downstream in Tuva logic, so the classification should reflect the claim at the header level rather than vary by line.' mapping_instructions: 'Map every claim to institutional, professional, or undetermined. This is a header-level field and must be identical across all lines for the same claim_id. Professional claims should have place_of_service_code and should not carry institutional-only indicators such as bill_type_code, admit_type_code, admit_source_code, discharge_disposition_code, drg_code, or revenue_center_code. Institutional claims should have bill_type_code and revenue_center_code and should not carry place_of_service_code. If the source does not cleanly support either pattern, map the claim to undetermined until the classification can be resolved. This field is critical because downstream service category grouping and encounter grouping depend on it.' required_for_data_marts: ['cms_hccs', 'encounters', 'semantic_layer', 'service_categories'] tests: @@ -801,7 +801,7 @@ models: meta: data_type: varchar - name: place_of_service_code - description: 'Place of service for the claim (professional claims only). This field is a two-character string that represents one of the standard place_of_service_code values, which represent a specific location where a medical service was provided. This field should be populated for professional claims and is a line-level field, so its value may be different for different lines in a given claim. DQI checks that the value of this field is a two-character string, but it does not check whether the value is valid (i.e. that this field matches one of the place_of_service_code values in terminology). If your raw data has invalid values, DQI will identify them downstream of the input layer. DQI raises a warning if a professional claim has null place_of_service_code values. In the case that place_of_service_codes are null or not populated for some claim lines in source data, these values may be backfilled with 99, which corresponds to "Other Place of Service." Note that place_of_service_code values may have leading zeroes. Often, these leading zeroes are missing in the source data. This issue should be corrected during the mapping process, and one way to handle this could be the following: ```sql lpad(place_of_service_code, 2, 0) as place_of_service_code ```' + description: 'Place of service code for professional claims. This is a line-level, two-character code that identifies the location where the medical service was provided, so values may differ across lines within the same claim. Professional claim lines should have a place_of_service_code; institutional claims should not use this field. Tuva checks that the value is two characters long, but downstream terminology validation determines whether it is a valid standard place_of_service_code. If source data omits this value for some professional claim lines, some organizations backfill 99 for Other Place of Service. Preserve leading zeroes when they exist in the standard code set and normalize missing leading zeroes during mapping.' mapping_instructions: 'Map the place of service code at the claim-line level for professional claims. Every professional claim line should have a place_of_service_code, and institutional claims should not use this field. The mapped value should conform to Tuva terminology for place_of_service_code and should preserve leading zeroes when they exist in the standard code set.' required_for_data_marts: ['encounters', 'quality_measures', 'semantic_layer', 'service_categories'] tests: @@ -1089,7 +1089,7 @@ models: meta: data_type: number - name: hcpcs_code - description: 'The CPT or HCPCS code representing the procedure or service provided. These codes are used to describe medical, surgical, and diagnostic services. This field is a string that represents procedures, services and supplies rendered by providers to patients. These codes exist at the line level, and there can be many HCPCS codes on a single claim. There are thousands of HCPCS codes spread across two levels: * Level 1 codes, also called CPT codes, are maintained by the American Medical Association (AMA). The Tuva Project does not have terminology for Level 1 codes for licensing reasons. * Level 2 codes, which are maintained by CMS. The Tuva Project has terminology for these codes. DQI checks that hcpcs_code values are not null on professional claims and ensures that mapped codes are HCPCS Level 2 codes. When this is the case, strategies for handling these values can be use case-specific. Organizations may opt to backfill null hcpcs_code values with 99499, a code used to report unlisted Evaluation and Management services when there is no other code that sufficiently corresponds to the services provided. The way HCPCS codes show up in claims data can vary: weve seen some carriers append a suffix to HCPCS codes, which makes them more than 5 characters. Like many of the other fields in your raw data, HCPCS codes may need some manipulation (e.g. stripping away a suffix) as you map them to the input layer.' + description: 'HCPCS or CPT code for the procedure, service, or supply provided on the claim line. This line-level field may vary across lines within the same claim. Level 1 HCPCS codes are CPT codes maintained by the American Medical Association, while Level 2 HCPCS codes are maintained by CMS; Tuva terminology covers Level 2 only. Professional claims are generally expected to have hcpcs_code values, and Tuva validates Level 2 mappings when applicable. Source values sometimes include suffixes or formatting differences, so normalization such as trimming suffixes may be needed during mapping. If no specific code is available, some organizations backfill 99499 for unlisted evaluation and management services, but that choice is use-case dependent.' mapping_instructions: 'Map the CPT or HCPCS code at the claim-line level. Every professional claim line should have one and only one hcpcs_code. Institutional claims may also carry HCPCS codes, but they are less consistently required. If the source appends suffixes or other nonstandard characters, normalize the value before mapping so the resulting code is the standard claim-line procedure code.' required_for_data_marts: ['cms_hccs', 'encounters', 'provider_attribution', 'quality_measures', 'semantic_layer', 'service_categories'] tests: @@ -1215,7 +1215,7 @@ models: meta: data_type: varchar - name: rendering_npi - description: 'Rendering NPI for the claim (typically represents the physician or entity providing services). This field is a string that contains NPI (National Provider Identifier) values. rendering_npi represents the practitioner who performed or rendered the specific service. This value can be populated in either institutional or professional claims and can be different across claim lines. NPIs are composed of numbers and are ten characters in length. DQI ensures that this field matches the expected length and character pattern. Source data may only include a single NPI field without specifying whether the provided identifier corresponds to a rendering, billing, or facility NPI. In that case, look for the NPI in Tuvas provider data file to determine whether it corresponds to a person or place. * If its a person, then the NPI should be mapped to rendering_npi. * If its a person and also a professional claim, then also map to billing_npi. * If its a location and the claim type is institutional, then map to facility_npi That logic could look like this: ```sql select ... , case when p.entity_type_code = 1 then npi else null end as rendering_npi , case when p.entity_type_code = 1 and claim_type = professional then p.npi else null end as billing_npi , case when p.entity_type_code = 2 and claim_type = institutional then p.npi else null end as facility_npi from source_data as sd left join provider_data__provider as p on p.npi = sd.npi ```' + description: 'Rendering NPI for the claim, typically representing the practitioner or entity that performed the service. This line-level field may be populated on either institutional or professional claims and may vary across claim lines. An NPI is a 10-digit numeric identifier, and Tuva validates the expected length and character pattern. Some source systems provide only a single NPI field without distinguishing rendering, billing, and facility roles. In that case, use provider data to determine whether the identifier belongs to a person or a place: if it belongs to a person, map it to rendering_npi; if it belongs to a place on an institutional claim, map it to facility_npi instead.' mapping_instructions: 'Map the NPI for the practitioner who rendered the service on the claim line. Every claim line should have a rendering_npi. If the source only provides a single NPI field, use provider reference data to determine whether that identifier is a person-level NPI, and map person-level NPIs to rendering_npi. If the source only provides facility or organization NPIs, the mapping should be improved upstream because logical DQ treats missing rendering_npi as a failure.' required_for_data_marts: ['cms_hccs', 'encounters', 'hcc_recapture', 'provider_attribution', 'semantic_layer', 'service_categories'] tests: @@ -1277,7 +1277,7 @@ models: meta: data_type: varchar - name: billing_npi - description: 'Billing NPI for the claim (typically represents organization billing the claim). This field is a string that contains NPI (National Provider Identifier) values. billing_npi typically represents the entity (organization or individual) responsible for billing and receiving payment for healthcare services. NPIs are composed of numbers and are ten characters in length. DQI ensures that this field matches the expected length and character pattern. Source data may only include a single NPI field without specifying whether the provided identifier corresponds to a rendering, billing, or facility NPI. In that case, look for the NPI in Tuvas provider data file to determine whether it corresponds to a person or place. * If its a person, then the NPI should be mapped to rendering_npi. * If its a person and also a professional claim, then also map to billing_npi. * If its a location and the claim type is institutional, then map to facility_npi That logic could look like this: ```sql select ... , case when p.entity_type_code = 1 then npi else null end as rendering_npi , case when p.entity_type_code = 1 and claim_type = professional then p.npi end as billing_npi , case when p.entity_type_code = 2 and claim_type = institutional then p.npi end as facility_npi from source_data as sd left join provider_data__provider as p on p.npi = sd.npi ```' + description: 'Billing NPI for the claim, typically representing the organization or individual responsible for billing and receiving payment for the service. An NPI is a 10-digit numeric identifier, and Tuva validates the expected length and character pattern. Some source systems provide only a single NPI field without distinguishing rendering, billing, and facility roles. In that case, use provider data to determine whether the identifier belongs to a person or a place: if it belongs to a person, map it to rendering_npi and, on professional claims, also consider mapping it to billing_npi; if it belongs to a place on an institutional claim, map it to facility_npi instead.' mapping_instructions: 'Map the NPI for the billing entity on the claim. Every claim should have one and only one billing_npi, and the value should be identical across all lines for the same claim_id. If the source only provides a single NPI field, use provider reference data to determine whether the identifier belongs to a billing provider organization or individual.' required_for_data_marts: ['encounters', 'semantic_layer'] tests: @@ -1339,7 +1339,7 @@ models: meta: data_type: varchar - name: facility_npi - description: 'Facility NPI for the claim (typically represents the facility where services were performed). This field is a string that contains NPI (National Provider Identifier) values. facility_npi typically represents the location where specific services were delivered. NPIs are composed of numbers and are ten characters in length. DQI ensures that this field matches the expected length and character pattern. Source data may only include a single NPI field without specifying whether the provided identifier corresponds to a rendering, billing, or facility NPI. In that case, look for the NPI in Tuvas provider data file to determine whether it corresponds to a person or place. * If its a person, then the NPI should be mapped to rendering_npi. * If its a person and also a professional claim, then also map to billing_npi. * If its a location and the claim type is institutional, then map to facility_npi That logic could look like this: ```sql select ... , case when p.entity_type_code = 1 then npi else null end as rendering_npi , case when p.entity_type_code = 1 and claim_type = professional then p.npi end as billing_npi , case when p.entity_type_code = 2 and claim_type = institutional then p.npi end as facility_npi from source_data as sd left join provider_data__provider as p on p.npi = sd.npi ```' + description: 'Facility NPI for the claim, typically representing the location where services were delivered. An NPI is a 10-digit numeric identifier, and Tuva validates the expected length and character pattern. Some source systems provide only a single NPI field without distinguishing rendering, billing, and facility roles. In that case, use provider data to determine whether the identifier belongs to a person or a place: if it belongs to a place on an institutional claim, map it to facility_npi; if it belongs to a person, map it to rendering_npi instead and, on professional claims, consider whether it also belongs in billing_npi.' mapping_instructions: 'Map the NPI for the facility where services were delivered. Inpatient institutional claims should have a facility_npi. If the source only provides a single NPI field, use provider reference data to determine whether the identifier is an organization or facility-level NPI before mapping it to facility_npi.' required_for_data_marts: ['ahrq_measures', 'ed_classification', 'encounters', 'readmissions', 'semantic_layer', 'service_categories'] tests: @@ -1681,7 +1681,7 @@ models: meta: data_type: float - name: diagnosis_code_type - description: 'The coding system used for the diagnosis code (e.g., ICD-10-CM, ICD-9-CM). This field is a string that describes the type of ICD diagnosis codes used on this claim. It must have one of the following two values: icd-9-cm or icd-10-cm. This is a header-level field, so its value must be the same for all lines in a given claim. This field should be populated for every row in the medical_claim table that has diagnosis codes. Claims data sources may not contain information about the diagnosis_code_type. On October 1, 2015, healthcare in the U.S. switched from ICD-9 to ICD-10. If there is no information about diagnosis_code_type in the source data, the switch-over date from ICD-9 to ICD-10 may be used: ```sql case when claim_end_date < 2015-10-01 then icd-9-cm else icd-10-cm end as diagnosis_code_type ``` DQI checks that claims with at least one populated diagnosis code have a populated diagnosis_code_type from one of the accepted values (icd-9-cm, icd-10-cm) and that the value of this field is consistent across all lines for the claim.' + description: 'Coding system used for diagnosis codes on the claim. This header-level field must be consistent across all lines for the same claim and should be populated whenever diagnosis codes are present. Allowed values are icd-9-cm and icd-10-cm. If the source does not explicitly provide diagnosis_code_type, a common approach is to infer it from service date: use icd-9-cm for dates before 2015-10-01 and icd-10-cm for dates after 2015-10-01. If the source spans the transition period and omits the code type, infer the value using both the date of service and the diagnosis code pattern.' mapping_instructions: 'Populate diagnosis_code_type whenever any diagnosis_code_1 through diagnosis_code_25 is populated. The allowed values are icd-9-cm and icd-10-cm. If the source does not explicitly provide the code type, it can be inferred using the ICD-9 to ICD-10 transition date when appropriate. This is a header-level field and must be identical across all lines for the same claim_id.' required_for_data_marts: ['ccsr', 'cms_chronic_conditions', 'cms_hccs', 'encounters', 'tuva_chronic_conditions'] tests: @@ -3334,7 +3334,7 @@ models: meta: data_type: varchar - name: procedure_code_type - description: 'Indicates the type of procedure code (e.g. ICD-10-PCS). This field is a string that describes the type of ICD procedure codes used on this claim. It must have one of the following two values: icd-9-pcs or icd-10-pcs. This is a header-level field on inpatient institutional claims, so its value must be the same for all lines in a given claim. This field should be populated for every row in the medical_claim table that has ICD procedure codes. Claims data sources may not contain information about the procedure_code_type. On October 1, 2015, healthcare in the U.S. switched from ICD-9 to ICD-10. If there is no information about procedure_code_type in the source data, the switch-over date from ICD-9 to ICD-10 may be used: ```sql case when claim_end_date < 2015-10-01 then icd-9-pcs else icd-10-pcs end as procedure_code_type ``` DQI checks that claims with at least one populated procedure code have a populated procedure_code_type from one of the accepted values (icd-9-pcs, icd-10-pcs) and that the value of this field is consistent across all lines for the claim.' + description: 'Coding system used for ICD procedure codes on the claim. This header-level field on inpatient institutional claims must be consistent across all lines for the same claim and should be populated whenever ICD procedure codes are present. Allowed values are icd-9-pcs and icd-10-pcs. If the source does not explicitly provide procedure_code_type, a common approach is to infer it from service date: use icd-9-pcs for dates before 2015-10-01 and icd-10-pcs for dates after 2015-10-01. If the source spans the transition period and omits the code type, infer the value using both the date of service and the procedure code pattern.' mapping_instructions: 'Populate procedure_code_type whenever any procedure_code_1 through procedure_code_25 is populated. The allowed values are icd-9-pcs and icd-10-pcs. If the source does not explicitly provide the code type, it can be inferred using the ICD-9 to ICD-10 transition date when appropriate. This is a header-level field for inpatient institutional claims and must be identical across all lines for the same claim_id.' required_for_data_marts: [] tests: diff --git a/models/input_layer/input_layer__pharmacy_claim.yml b/models/input_layer/input_layer__pharmacy_claim.yml index 00f01e489..812c83319 100644 --- a/models/input_layer/input_layer__pharmacy_claim.yml +++ b/models/input_layer/input_layer__pharmacy_claim.yml @@ -347,7 +347,7 @@ models: meta: data_type: date - name: ndc_code - description: 'National drug code associated with the medication. This field represents the National Drug Code (NDC) for the actual drug being dispensed. Each line on a pharmacy claim represents a drug that was dispensed, so each line must have an ndc_code. NDC codes are written as a 10-digit number on drug packaging, but an additional digit is usually added when billing an NDC on a healthcare claim, making the NDC have 11 digits on pharmacy claims. If your raw data has 10-digit NDC codes, you must add a 0 to the code to make it 11 digits when mapping to the pharmacy_claim input layer table. The 11-digit number follows a 5-4-2 format, i.e. 5 digits in the first segment, 4 digits in the second segment, and 2 digits in the third segment. The rules for which segment the additional digit is added to are as follows: - 4-4-2 becomes 5-4-2 - 5-3-2 becomes 5-4-2 - 5-4-1 becomes 5-4-2 Essentially you add a leading zero to whichever segment needs it. If your 10-digit codes are not separated into segments by dashes, it is impossible to know where to add the extra 0 and so you cannot accurately turn your code into an 11-digit code and can therefore you cannot map it to the pharmacy_claim input layer table. Whether your raw data has 11-digit NDC codes or 10-digit codes that you may successfully convert to 11-digit codes, you must remove the dashes in the code when mapping to the pharmacy_claim input layer table. The ndc_code field should should always be populated with 11-character strings. DOI checks that the ndc_code field is always populated. DQI does not check whether the value of this field is a valid value from terminology. If your raw data has invalid values, you will map them to the input layer and Tuvas data quality intelligence flag invalid values downstream from the input layer.' + description: 'National Drug Code (NDC) for the medication dispensed on the pharmacy claim line. Each pharmacy claim line should have an ndc_code. NDCs are often shown on packaging as 10-digit codes, but pharmacy claims typically require the 11-digit billing format. If your source data contains 10-digit NDCs, convert them to 11 digits before mapping. The 11-digit format follows a 5-4-2 pattern; when converting from a dashed 10-digit format, add a leading zero to the segment that is short. If the source stores a 10-digit NDC without dashes or segment boundaries, you generally cannot determine where the extra zero belongs reliably. Remove dashes before mapping. Tuva expects this field to be populated with an 11-character string, but downstream validation determines whether the mapped value is a valid terminology code.' mapping_instructions: 'Map the NDC for the medication that was dispensed. Every pharmacy claim line should have an ndc_code. Normalize 10-digit source NDCs to the 11-digit billing format, remove dashes, and make sure the final mapped value is valid against Tuva terminology.' required_for_data_marts: ['cms_chronic_conditions', 'hcc_suspecting', 'pharmacy', 'quality_measures', 'semantic_layer'] tests: