Considerations for enhancing the credibility of health economic models: a review

Xuanqian Xie; Alexis K. Schaink; Olga Gajic‐Veljanoski; Kamilla Guliyeva; Wendy J. Ungar

doi:10.21037/jhmhp-25-81

Review Article

Considerations for enhancing the credibility of health economic models: a review

Xuanqian Xie¹ , Alexis K. Schaink¹, Olga Gajic‑Veljanoski¹ , Kamilla Guliyeva¹ , Wendy J. Ungar^2,3

¹Acute and Hospital-Based Care, Ontario Health, Toronto, ON, Canada; ²Program of Child Health Evaluative Sciences, the Hospital for Sick Children Research Institute, Toronto, ON, Canada; ³The Institute of Health Policy, Management and Evaluation, University of Toronto, Toronto, ON, Canada

Contributions: (I) Conception and design: All authors; (II) Administrative support: X Xie; (III) Provision of study materials or patients: X Xie, AK Schaink, O Gajic‑Veljanoski; (IV) Collection and assembly of data: X Xie, AK Schaink, O Gajic‑Veljanoski; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Xuanqian Xie, MSc. Acute and Hospital-Based Care, Ontario Health, 525 University Avenue, 5th Floor, Toronto, ON M5G 2L3, Canada. Email: shawn.xie@ontariohealth.ca.

Abstract: Health economic evaluations address resource allocation problems and have been widely used in health technology assessment for adopting new interventions. Numerous countries and international societies have developed economic evaluation guidelines. There is consensus among the guidelines on several key principles including the type of analysis [cost-utility analysis (CUA) preferred], time horizon of the analysis (long enough required), health outcome [the quality-adjusted life-year (QALY) preferred], and sensitivity analysis (various analyses recommended). Long-term CUAs which aim to capture all intended effects and unintended side effects of interventions, ideally over the lifetime horizon, are generally recommended. However, strict application of these recommendations could introduce greater uncertainty in economic evaluation results. We examine various guidelines to distill the commonalities and suggest more nuanced approaches for improving the presentation of the credibility of economic model results (i.e., the likelihood of cost-effectiveness results being trusted). We propose several considerations for enhancing the credibility of economic evaluation conclusions. First, when it is challenging to make a reliable projection of long-term outcomes, health economists may limit the time horizon to a shorter time supported by the available data. Second, the economic evaluation can be conducted using natural units of health when high-quality utility data are not available. Third, health economists need to understand whether the available data can support complex disease models so they can choose a proper level of model complexity. Fourth, health economists need to specify factors that are associated with health benefits, and to control remaining factors (e.g., background mortality and baseline characteristics). Fifth, instead of focusing on selecting fitted parameters from real-world data, economic evaluations should use the best quality clinical evidence for their key model inputs. In conclusion, economic evaluations should aim to produce the most credible cost-effectiveness estimates by adhering to rigorous methodological standards and by adapting recommendations to suit the specific context. Health economists should clearly identify and report potential biases, assumptions, the quality of evidence informing key model parameters, and sources of uncertainty to ensure the credibility of the cost-effectiveness results and their use in decision and policy making.

Keywords: Health economic evaluation; methodological guidelines; credibility; time horizon; quality-adjusted life-year (QALY)

Received: 20 August 2025; Accepted: 22 December 2025; Published online: 02 February 2026.

doi: 10.21037/jhmhp-25-81

Introduction

Economic evaluations address resource allocation problems and have been widely used in health technology assessment (HTA) for adopting new interventions in decision making. Health economic evaluations aim to compare the costs and consequences of alternative health interventions including policies, services, treatments and tests (1). Based on the measurement of health interventions’ costs and consequences, economic evaluations can be categorized into four types (1):

Cost analysis [also called cost-minimization analysis (CMA)]: costs are valued using monetary units, while the health benefits in natural units or quality-adjusted life-years (QALYs) are not captured.
Cost-effectiveness analysis (CEA): both costs and health outcomes are considered in the analysis. Natural units, such as life-years and the number of cancer recurrences, are used as the measure of health outcomes.
Cost-utility analysis (CUA): type of economic evaluation where both costs and health outcomes are considered and health outcomes are expressed using a universal preference-based measure such as the QALY which combines the quality and quantity of life. CUA enables policy makers to compare various interventions for different diseases.
Cost-benefit analysis (CBA): both cost and health outcomes are included but single or multiple health consequences are translated into monetary units.

Sharma et al. reviewed 31 national economic evaluation guidelines (2). Of these, 15 guidelines (49%) recommended CUAs, 10 guidelines (32%) recommended CUAs or CEAs, and the remaining 6 (19%) recommended any justifiable type of analysis (2). This review found a substantial consensus among the guidelines on several key principles, including the type of analysis (CUA preferred), time horizon of the analysis (long enough), health outcome (measured in QALYs), and sensitivity analysis (various analyses recommended). Long-term CUAs, ideally lifetime, are generally recommended (1-4). A lifetime CUA measures costs and health consequences in QALYs over an entire lifetime, and aims to capture the full intended effects and unintended adverse effects of interventions (1). Recommendations about the perspectives of analysis, cost components included (e.g., indirect costs), utility instruments, and discount rates varied across the guidelines (2).

Presently, most economic evaluations typically adhere to widely accepted methodological guidelines. Sometimes these recommendations are used to evaluate the methodological quality of economic evaluations, whereby following the recommendations is considered as an indicator of good quality. However, rigid application of methodological recommendations across diverse contexts could inadvertently introduce greater uncertainty and potentially result in less trustworthy results.

The availability of published high-quality health utility data and other inputs for long-term disease modeling provides the optimal basis for conducting a lifetime CUA. However, very often, modelers are faced with a lack of high-quality data to support the development of such economic models. It is not uncommon to identify cost-effectiveness findings that are based on unknown or uncertain clinical benefits; also, low quality clinical data can be found to be associated with more favorable cost-effectiveness results. For instance, one economic evaluation showed that hydrophilic-coated single-use catheter is highly cost effective compared with uncoated single-use catheter for people with spinal cord injury (5). However, the key clinical parameters in this study, such as the complications associated with the use of catheter (e.g., urinary tract infection), were based on estimates from an expert panel (5). Another example arises from a review of 350 treatment comparisons in oncology that found observational studies based on population-based registries for cancer research in the United States are significantly more likely to demonstrate survival benefits compared to randomized controlled trials (RCTs) (6). Thus, economic evaluations informed by real-world evidence studies may lead to more favorable cost-effectiveness results.

In some cases, there is a lack of reliable data to support a long-term time horizon for a CUA model. To adhere to the recommendations, health economists must use poor quality data (including expert opinion in many cases) and often make unverifiable assumptions that introduce considerable uncertainty in the model results and study conclusions. Conversely, if health economists interpret the recommendations as general guidance and pursue context-appropriate deviations, such as to use a shorter time horizon or a clinical event as the main effectiveness measure, model-based economic evaluations may lead to more credible results. In this study, we aim to address perspectives related to the credibility of model findings from health economic evaluations. Credibility addresses the extent to which the study accurately answers the question it is intended to answer (7). For the present study, credibility refers to the likelihood of cost-effectiveness results being trusted. This means that a credible economic evaluation provides results with considerable certainty, approaching true values.

To distill the commonalities of published health economic evaluation guidelines, we examined the review of the 31 guidelines by Sharma et al., and other guidelines recommended by the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) (8-10). We further reviewed the national guidelines from Canada, Netherlands and the United Kingdom in detail (3,4,11), which were included in the review by Sharma et al. (2). We suggest more nuanced considerations for improving the presentation of credibility of economic models and cost-effectiveness results. In Table 1, we present common challenges and approaches and elaborate on them in the main text. Additionally, we highlight some considerations regarding the likelihood of bias in economic evaluations when the recommendations are not followed.

Table 1

The summary of current methodological practice and the new perspectives

Methodological consideration	Recommended practice	New perspective
Time horizon	Time horizon should be based on the natural history of disease and anticipated effects of the intervention (2)	It may be challenging to make a reliable projection of the long-term cost-effectiveness results [e.g., strong assumptions, possible increase in bias^†(12)]
Time horizon	A long-term time horizon, ideally lifetime, is often recommended (1-4)	If health economists limit the model to a shorter time horizon supported by the observed data, the results could be more credible
QALYs vs. natural units of health	While many guidelines typically recommend the use of CUA including measuring health outcomes in QALYs (2), some guidelines do not explicitly advocate for any particular outcome measure (2)	A CUA ought to be conducted when medium- or high-quality utility data for the main health states are available
QALYs vs. natural units of health	Health economists make efforts to get QALY estimates, even when credible data such as health utilities are not available	Otherwise, the economic evaluation can be conducted using the natural unit of health to represent the value of examined health technology
Appropriate complexity level of the model	Economic models are recommended to be consistent with the natural history of disease and clinical pathways (2)	Health economists need to understand the main clinical evidence (e.g., treatment efficacy, adverse effects, and the natural history of the disease) before developing a model
Appropriate complexity level of the model	The conceptualization of model should be determined by the decision question(s), as opposed to availability of data (9), but sometimes there are no appropriate data to populate the model	The complex model is often associated with additional assumptions, lack of data for model parameters and technical challenges. We suggest health economists choose the appropriate level of model complexity in relation with the availability of the data that would inform the model structure and all parameter assumptions
Factors to be controlled	Various approaches are used in practice, but there are no specific recommendations	Health economists may control for some factors in economic modelling (e.g., assigning the same risk of the secondary clinical outcomes and the same baseline characteristics for both intervention and comparator)
Factors to be controlled		The cost-effectiveness results should reflect the primary health outcomes of interest
Data sources for the key clinical parameters	The guidelines in the Netherlands strongly recommended to use evidence from RCTs to determine the effectiveness of model parameters (4), while many other guidelines are focused on selecting the fitted parameters and transparently reporting the data sources (3,13,14)	For the debated treatment areas with inconsistent clinical evidence, a systematic review of the clinical evidence for a given intervention should be conducted to inform an economic analysis
Data sources for the key clinical parameters	In practice, health economists select appropriate data sources and values for model parameters based on their judgement. It has become more popular to use parameters from RWD	Instead of focusing on selecting the fitted parameters, economic evaluations should use the best quality (or least biased) clinical evidence for key model inputs and supplement it with data from other sources (e.g., RWD) if absolutely needed

^†, a simulation study demonstrated that when the evidence is weak (e.g., from a study with small sample size), the bias (systematic error between the model results and true value) dramatically increases as the time horizon increases in Markov models (12). CUA, cost-utility analysis; QALY, quality-adjusted life-year; RCT, randomized controlled trial; RWD, real-world data.

Time horizon: long-term or short-term

Time horizon is often one of the key factors impacting the magnitude of QALY gains and cost-effectiveness results. A review of cost-utility studies published in 2010 showed that the QALY gains increased with a longer model time horizon (the mean QALY gains of 0.04 for a model of less-than 1 year vs. 0.17 for 1 to 5 years, and 0.43 for over 5 years) (15). Conceptually, the time horizon needs to be determined based on the natural history of the disease and the anticipated effects of the intervention (2,3). A clinical trial of a highly efficacious intervention often does not cover the entire period of potentially beneficial impact of the intervention, so short-term economic modelling based on the trial period might underestimate the health benefits. A model with a long-term time horizon that incorporates the full trajectory of the disease progression, effects of interventions, and healthcare resource use is more appropriate. In a more complex scenario, such as vaccination programs, the time horizon may extend beyond the lifetime of a single cohort and should relate to the expected lifetime of future patients. Consequently, long-term models are generally recommended (1-4).

However, extrapolating short-term outcomes over the long term may increase uncertainty and potentially introduce bias, especially when there is a lack of data or high-quality clinical evidence (e.g., emerging interventions) to inform the long-term model development. Although methods to extrapolate short-term estimates are available, it is unclear whether these methods can produce trustworthy estimates (16). For disease areas with distinct and heterogeneous clinical courses, modeling a future course of the disease that is different from the past may not be possible without the data observed for the distinct and heterogeneous phases. For example, a pooled analysis of 1,861 patients with advanced melanoma showed that the median overall survival was 11.4 months [95% confidence interval (CI): 10.7–12.1 months], and the 3-year survival rate was estimated to be 22% (95% CI: 20–24%) (17). However, the Kaplan-Meier curve showed that a plateau began around year 3 and extended up to year 10. In this case, if we only have 36-month survival data and we use a parametric regression model to predict the long-term survival, we may not be able to accurately project the 5-year and 10-year survival since the survival did not change after the third year. Furthermore, a simulation study demonstrated that when the clinical evidence is weak (e.g., from a study with small sample size), the bias (systematic error between the model results and true value) dramatically increases as the time horizon increases in Markov models (12).

Even when long-term clinical data are available, they are not necessarily a reliable basis to project accurate future clinical and economic outcomes because clinical practice, life expectancy, costs and quality of life do not remain constant over time. For example, the life expectancy increases over time in most economically stable countries and the age and sex-specific mortality rates decrease over time (18). Moreover, for many diseases, such as several types of cancers, the net survival (which adjusts for different levels of the background risk of death) has also improved substantially (19). Clinical practices, consumption of resources and associated costs are likely to change markedly over decades into the future as well. Health state utilities also may change over time, but these time-dependent changes are often omitted in economic modelling due to lack of observed data. Furthermore, a utility measure reflects health preferences, which are not constant over time. Also, a constant annual discounting rate, suggested in current guidelines, may not always be suitable for long-term models (1).

In summary, health economists need to understand the trade-off between comprehensively capturing health benefits and significantly amplifying uncertainties (i.e., potentially reducing credibility) over a long-term time horizon. A short-term time horizon may be an appropriate option when it is challenging to make a rigorous projection of long-term cost-effectiveness outcomes into the future. We recommend health economists provide justification for selecting a specific duration of the time horizon in the reference case analysis, and conduct several scenario analyses to evaluate different time horizons and their impact on the robustness of the cost-effectiveness results.

Selecting an effectiveness measure for health economic modelling

The QALY incorporates both health-related quality of life and length of life into a single effectiveness measure. It is a consistent and transparent outcome measure for evaluating health interventions in public health systems (20). Furthermore, the QALY can be used to evaluate trade-offs of non-health public investments, such as education and social services, focusing on the marginal cost of non-health services to society. Also, the QALY has proven to be a pragmatic universal generic metric to facilitate comparisons across patient populations and interventions (21). For example, a systematic review of economic evaluations included 12 CUAs of preventive digital public health interventions (e.g., text messages, web-based inventions, app-based interventions) from seven Organisation for Economic Co-operation and Development (OECD) countries (22). Authors used OECD purchasing power parities (PPP) to adjust incremental cost-effectiveness ratios (ICERs) to a common monetary unit, and ranked adjusted ICERs for different interventions (e.g., smoking cessation, weight loss, physical activity) across different countries. In general, health economic guidelines recommend using a CUA, which measures effectiveness in QALYs, in the reference case (2). However, there are also some concerns about using QALYs as a measure of health outcomes. First, preferences obtained from the general population can differ from those of individual patients (23). Alolabi et al. conducted a study to investigate the utility of hand composite tissue allotransplantation from the perspectives of hand amputee patients and the general public (24). A list of potential complications post hand transplantation (e.g., acute graft rejection: 85%, amputation: 20%) was presented to participants during the interview. This study showed that the utility gain using the time trade-off (TTO) technique was relatively small, only 0.02 from hand transplantation (hand amputation: 0.72; hand transplantation: 0.74) from the general public perspective, while the utility gain of this intervention was much larger, 0.14 from the perspective of hand amputee patients (hand amputation: 0.69; hand transplantation: 0.83). Also, there is lack of empirical evidence to support underlying assumptions of the QALY, including the independence between utilities and the amount of time spent in a given health state, and the lack of a relationship between utilities and a person’s remaining life expectancy (21). More importantly, an ideal unit of measure should be meaningful, valid, reliable (i.e., repeatable), and relevant; but the QALY may not fully meet all these criteria (25). Therefore, we discuss below validity and reliability of health utilities which are used to calculate QALYs.

Given that the true health utilities of treated and untreated health states are often unknown, it is difficult to assess the validity and measurement bias associated with these utilities. There are several common methods for valuation of utilities [e.g., TTO, discrete choice, EuroQol 5-Dimension (EQ-5D), and the Short Form 6 Dimensions (SF-6D)] (23). For the same health state, different utility instruments may result in different utility values. When using the same tool, different versions of a scale (e.g., EQ-5D-5-5L vs. EQ-5D-3L) may also yield different utilities values for the same health state (26). One study also found that the utilities for most health conditions differed significantly across countries (Argentina, Chile, and the United Kingdom) (27). This is problematic because different utility tools result in different magnitudes of the incremental QALYs, thus, resulting in different ICERs, and consequently different interpretation of the cost-effectiveness results. Utility values typically range from 0 to 1, but some studies allow negative utility values (i.e., the health states that are considered worse than death). Allowing for negative utility values may impact the estimates of average utility values. Even when the same preference-based instrument is used, the reliability of utilities, which is associated with the precision, is often poor (25). Xie et al. (28) found that the magnitude of the utility decrement due to severe hypoglycemic events in type 2 diabetes patients (measured by EQ-5D-3L) varied greatly across studies from 0.002 (29) to 0.27 (30). Since the true utility decrement is unknown, it is not easy to determine the “correct” or less biased utility decrement for severe hypoglycemic events that could be used to populate an economic model; for instance, an intervention can reduce the risk of severe hypoglycemic events in diabetes, and the estimates of QALY gains will differ greatly, depending on the selection of utility data.

Since guidelines recommend the QALY as an effectiveness measure for the reference case, health economists make efforts to obtain QALY estimates, even when suitable utilities are not available. For example, health economists may map utility values from a disease-specific quality of life measure, use a proxy measure, or obtain utilities from low-quality data (including experts’ opinions). Although health economists are always able to produce some QALY estimates, the bias associated with the QALY estimation can be high. Therefore, we suggest judging the credibility of sources for estimating QALYs if the QALY is selected as the effectiveness measure for an economic evaluation. A CUA may be conducted only when medium- or high-quality utility data are available and when there are no major violations for the underlying assumptions of the utility and QALY (21,23).

When there is no suitable data to support the conduct of a CUA, we can perform an economic evaluation using natural units (e.g., a CEA). In many scenarios, health outcomes in natural units can be more meaningful to clinicians and could be more straightforward to estimate and derive, compared with QALYs. Decision-makers may have a stronger understanding of results expressed in natural units (e.g., life-years or cancer recurrences) and this can further facilitate decision-making. However, researchers have concerns that there is less guidance for the willingness-to-pay (WTP) threshold expressed in natural units compared with the WTPs expressed in QALYs. Nord proposed a method to estimate a cost threshold for given units of health interventions (e.g., surgical treatments) (31). This approach incorporated the severity of a health condition and gains in health for estimating the WTP for new health technologies and showed that the WTP increased with the greater disease severity or the absolute shortfall of life expectancy.

Balancing the model complexity with data availability

Health economic models ought to follow the disease progression and treatment management flow (2). It is recommended that the conceptualization of the model be determined by the decision problem questions as opposed to the availability of data (9). However, following this recommendation closely can result in a paradox: a complex model is developed, but there is no appropriate data to populate it. The model parameters from expert opinions or low-quality clinical evidence may accompany the development of complex models. For example, the cost-effectiveness of diagnostic pathways for people with suspected asthma were examined in an HTA (32). Authors stated that if people do not have asthma, they are likely to have other diseases [chronic obstructive pulmonary disease (COPD), chronic heart failure, physical de-conditioning and acute symptoms] because they have presented with some respiratory symptoms. Although this idea makes sense, there are no data to quantify these concepts. Eventually, all parameters related to this pathway were based on assumptions and expert opinions (32). Conceptually the complex model better reflects reality, but when the quality of the model parameters is low, the model results may lack credibility. We suggest that health economists understand the main clinical evidence and disease history before developing a model. It is essential not to over-broaden the model and to ensure that key parameters are obtained from reliable sources.

Since the true health and economic outcomes of the intervention of interest (e.g., cumulative QALYs and costs over a given time horizon) are unknown, it is challenging to propose a scientifically rigorous approach to determine the proper model structure and select parameters (e.g., clinical events, risk predictors and steps of management). However, we can borrow some general principles from those used in developing clinical models (e.g., model and variable selection). We need to understand why the complex model is not preferred, for example:

A complex model is often associated with additional assumptions and model results based on fewer assumptions usually have better generalizability than those based on more assumptions.
A complex model must properly incorporate the relationship of events considered, but information regarding the relationship between parameters is often not available. In economic modelling practice, correlated clinical events are often assumed to be mutually exclusive or independent and subsequent decision steps are often assumed to be independent though they may be conditional on the previous events [e.g., patients who fail initial treatment may have lower success rates with a second treatment, compared with patients without previous treatment (33)]. Thus, the complex model using such strong assumptions is not necessarily more valid than a simpler model.

There is a trade-off between the model’s external validity (reflecting complex care pathways) and generating more uncertainty from poor quality data inputs. Robust complex models better reflect actual disease progression and clinical pathways. When these models are constructed with high quality data, credible model results are expected. A complex model also can be used to explore uncertainties arising from the model assumptions or from interactions between individuals [e.g., for infectious diseases, and competing healthcare resources (e.g., operating room)]—this cannot be addressed in a simple model. Therefore, health economists need to understand the decision analysis questions and the available data so to choose a model with adequate complexity.

Controlling for additional factors in economic modelling

In economic modelling, health economists also need to control for additional factors (e.g., secondary clinical outcomes). For example, a multicentre, double-blind RCT showed that compared with control dressing, sucrose octasulfate dressing significantly improved wound closure for diabetic foot ulcers (34). This RCT also found a lower mortality rate in the octasulfate dressing group (2%) compared with the control group (4%), but these deaths were not related to the treatment and wound progression. No statistical significance test was conducted for the mortality outcome. Two model-based economic evaluations (35,36) used the RCT results and included the reported survival advantage; they concluded that sucrose octasulfate dressing was cost-effective compared with control dressing even though the survival difference may have represented random variation rather than a true treatment effect. In this case, fair-minded skepticism could have been considered (37) where health economists could have assumed the same mortality risk between the strategies; if under this more pessimistic assumption, the intervention remained cost-effective, the conclusions of the CEA would have been more robust; otherwise, the question related to the credibility of the published cost-effectiveness results remain.

In another example, a CUA compared robotic prostatectomy with laparoscopic prostatectomy, and it reported differences in the baseline characteristics of the target population between these two surgeries [e.g., age (interquartile) of 61.5 years (39−74 years) in robotic prostatectomy vs. 63 years (43−76 years) in laparoscopic prostatectomy, in Tab. 22 of the original publication] (38). Consequently, differences in the health economic outcomes were partially driven by the differences in the baseline characteristics (e.g., age-related mortality).

There are always many types of uncertainty related to decision making. If we consider all sources of uncertainty together, results in most economic evaluations are likely to be inconclusive. We suggest that health economists specify factors by which the final health benefits are driven, and to control for the remaining factors, such as the background mortality and baseline characteristics.

Determining key clinical parameters for an economic model

Economic evaluations addressing the same comparison based on published studies may exhibit discrepancies in the measure of effectiveness and overall conclusions. Assigning different values for key clinical parameters is one of the most important factors that explain the inconsistency. Two model-based CUAs which compared robot-assisted radical prostatectomy (RARP) vs. open radical prostatectomy (ORP) for localized prostate cancer at the similar settings used different sources for key clinical parameters, and found substantially different cost-effectiveness results (ICER of $5.2 million per QALY gained based on the RCT data vs. ICER of $25,704 per QALY gained based on the observational study data) (39,40). The guidelines in the Netherlands strongly recommend the use of evidence from RCTs to determine the effectiveness of model parameters (4), while many other guidelines focus on selecting the fitted parameters and transparently reporting the data sources (3,6,7). In practice, health economists make a judgement regarding the appropriateness of data sources and values (or distributions) used to populate the models. Since it is not always straightforward to determine the key clinical model parameters, we provide some thoughts below.

Targeted (limited) literature review of clinical evidence

Ideally, the economic evaluation should be conducted as part of an HTA or should be preceded by a systematic review of the clinical evidence, to ensure that validity and certainty in the clinical evidence are fully comprehended. But, in practice, economic evaluations may be conducted in various contexts and may likely have tight timelines. When the clinical evidence is clear and consistent, it may be reasonable to determine the key model parameter values based on the targeted (limited) evidence review. However, for debated treatment areas with inconsistent clinical evidence, it is risky to select key clinical model parameter values without conducting a systematic review of the existing clinical evidence.

Consistency with pathophysiology and human biology

Although clinical results may vary across different studies, these data should be consistent within expected changes in health outcomes. For instance, some low-risk cancers (e.g., localized thyroid cancer) have overall survival comparable to the general population (41). Therefore, new cancer interventions for low-risk cancers will have minimal effect on overall survival, even if they demonstrate improvements in surrogate endpoints (42). The utility value for people with certain health conditions needs to be compared and appropriately adjusted (e.g., using age- and sex-specific utility norms in the general population) (43). Thus, if a study reported a high utility value for people with a health condition, such as health utility of 0.99 for being disease-free after thyroid lobectomy (44), an adjustment of the originally reported utility may be required in the economic modelling (45).

Selection of utility instruments

The utility instruments may have an impact on the utilities or changes in the utility values due to the treatment. For instance, meta-analyses found that hearing aids significantly improved the Health Utilities Index 3 (HUI-3) utility (average increase of 0.108, 95% CI: 0.074–0.141) for adults with mild to moderate hearing loss, but they had no benefit if the utility was measured with the EQ-5D (46). This is because the HUI-3 classification system is comprised of 8 attributes, which include hearing and speech as two attributes (47), while none of the five dimensions in the EQ-5D instrument explicitly evaluate hearing or speech difficulties. Since EQ-5D utility lacks sensitivity to changes in hearing-specific health-related quality of life, HUI-3 is more responsive and probably can be considered a better utility measure to evaluate the effects of wearing hearing aids. Some national guidelines (e.g., England and the Netherlands) recommended EQ-5D utility instruments only, while most national guidelines (e.g., Canada, France and the United States) also recommended other utility instruments, such as SF-6D and HUI-3 (2). Although there is no simple way to choose the appropriate utility measure, health economists may provide a justification for selecting a specific utility instrument and explore the uncertainty driven by using different utility measurement instruments.

Use of real-world data (RWD)

For the key model clinical inputs, published clinical data of the best quality ought to be used in economic evaluations, while RWD may be considered a supplement. In recent years, it has become increasingly popular to use clinical input parameter values in economic evaluations from studies with RWD. RWD are often the data routinely collected for patient health care resource use, typically reported in administrative healthcare databases (e.g., inpatient and outpatient visit records, billing activities), hospital records, and disease registries (48). RWD contextualize information about patient management and can be considered an important source additional to high-quality clinical evidence (e.g., RCTs, and well-designed prospective observational studies). However, considering the inherent characteristics of RWD, where data collection lacks a specific focus on evaluating clinical effectiveness, and where patient allocation to different treatments is based on clinical indications rather than randomization, results obtained from RWD studies may be susceptible to a high risk of bias.

Strengths and limitations

Our review provides additional views regarding the key components of economic evaluations to enhance the credibility of the economic model results. It may help health economists to understand the pros and cons of following or not following closely the published guidelines and may potentially improve consistency and transparency of health economic modelling practice. However, our review also has some limitations. First, we did not conduct a systematic literature review of all articles which focused on economic modelling methods, and our aim was to provide guidance based on our own modeling experiences and practice. Also, we did not include or discuss other important components of health economic modelling such as selection of study perspectives, comparators, and costing methods.

Conclusions

Health economic evaluations should aim to produce the most credible cost-effectiveness estimates by adhering to rigorous methodological standards and by adapting the recommendations to suit the specific context. Health economists need to understand whether the available data can support complex, long-term disease models; this knowledge would help them determine an appropriate level of model complexity including the duration of time horizon. However, regardless of the time horizon or model complexity, health economists should clearly identify and describe potential biases, assumptions, the quality of evidence informing key model parameters, and sources of uncertainty to ensure the credibility of the cost-effectiveness results and overall conclusions. Finally, health economists should enhance transparency in the reporting of all sources of uncertainty to improve the use of their economic findings in decision and policy making.

Acknowledgments

None.

Footnote

Peer Review File: Available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-81/prf

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://jhmhp.amegroups.com/article/view/10.21037/jhmhp-25-81/coif). X.X. serves as an unpaid editorial board member of Journal of Hospital Management and Health Policy from September 2025 to December 2027. W.J.U. reports receiving royalties from Oxford University Press for her book Economic Evaluation in Child Health, consulting income from BroadStreet HEOR, and honoraria from the EuroQol Research Foundation and the Canadian Fertility & Andrology Society. She also received travel support from the EuroQol Research Foundation. In addition, she holds unpaid government advisory board positions as Chair of the Ontario Genetics Advisory Committee and Member of the Ontario Health Technology Advisory Committee. The opinions expressed in this publication do not necessarily represent the opinions of Ontario Health. No endorsement is intended or should be inferred. The other authors have no conflicts of interest to declare.

Ethical Statement:The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Drummond MF, Sculpher MJ, Claxton K, editors. Methods for the economic evaluation of health care programmes. 5th ed. Oxford: Oxford University Press; 2015.
Sharma D, Aggarwal AK, Downey LE, et al. National Healthcare Economic Evaluation Guidelines: A Cross-Country Comparison. Pharmacoecon Open 2021;5:349-64. [Crossref] [PubMed]
Canadian Agency for Drugs and Technologies in Health. Guidelines for the economic evaluation of health technologies: Canada. 4th ed. Ottawa: CADTH; 2017:76.
National Health Care Institute in Netherlands. Guideline for economic evaluations in healthcare. 2016. Available online: https://english.zorginstituutnederland.nl/about-us/working-methods-and-procedures/guideline-for-economic-evaluations-in-healthcare
Clark JF, Mealing SJ, Scott DA, et al. A cost-effectiveness analysis of long-term intermittent catheterisation with hydrophilic and uncoated catheters. Spinal Cord 2016;54:73-7. [Crossref] [PubMed]
Soni PD, Hartman HE, Dess RT, et al. Comparison of Population-Based Observational Studies With Randomized Trials in Oncology. J Clin Oncol 2019;37:1209-16. [Crossref] [PubMed]
Berger ML, Martin BC, Husereau D, et al. A questionnaire to assess the relevance and credibility of observational studies to inform health care decision making: an ISPOR-AMCP-NPC Good Practice Task Force report. Value Health 2014;17:143-56. [Crossref] [PubMed]
Briggs AH, Weinstein MC, Fenwick EA, et al. Model parameter estimation and uncertainty: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force--6. Value Health 2012;15:835-42. [Crossref] [PubMed]
Caro JJ, Briggs AH, Siebert U, et al. Modeling good research practices--overview: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force--1. Value Health 2012;15:796-803. [Crossref] [PubMed]
Roberts M, Russell LB, Paltiel AD, et al. Conceptualizing a model: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force--2. Value Health 2012;15:804-11. [Crossref] [PubMed]
National Institute for Health and Care Excellence. NICE health technology evaluations: the manual (Last updated: 14 July 2025). London: The Institute; 2022:200. Available online: https://www.nhsprocurement.org.uk/news/nice-health-technology-evaluations-manual
Xie X, Yeung MW, Wang Z, et al. Comparison of the expected rewards between probabilistic and deterministic analyses in a Markov model. Expert Rev Pharmacoecon Outcomes Res 2020;20:169-75. [Crossref] [PubMed]
National Institute for Health and Care Excellence. Methods for the development of NICE public health guidance. London (UK): The Institute; 2012. Available online: https://www.nice.org.uk/process/pmg4/chapter/developing-recommendations
Sanders GD, Neumann PJ, Basu A, et al. Recommendations for Conduct, Methodological Practices, and Reporting of Cost-effectiveness Analyses: Second Panel on Cost-Effectiveness in Health and Medicine. JAMA 2016;316:1093-103. [Crossref] [PubMed]
Wisløff T, Hagen G, Hamidi V, et al. Estimating QALY gains in applied studies: a review of cost-utility analyses published in 2010. Pharmacoeconomics 2014;32:367-75. [Crossref] [PubMed]
Tremblay G, Haines P, Briggs A. A Criterion-based Approach for the Systematic and Transparent Extrapolation of Clinical Trial Survival Data. J Health Econ Outcomes Res 2015;2:147-60. [Crossref] [PubMed]
Schadendorf D, Hodi FS, Robert C, et al. Pooled Analysis of Long-Term Survival Data From Phase II and Phase III Trials of Ipilimumab in Unresectable or Metastatic Melanoma. J Clin Oncol 2015;33:1889-94. [Crossref] [PubMed]
Dattani S, Rodés-Guirao L, Ritchie H, et al. Life Expectancy. 2013. Available online: https://ourworldindata.org/life-expectancy
Ellison LF. Progress in net cancer survival in Canada over 20 years. Health Rep 2018;29:10-8.
Neumann PJ, Cohen JT. QALYs in 2018-Advantages and Concerns. JAMA 2018;319:2473-4. [Crossref] [PubMed]
Johnson FR, Scott FI, Reed SD, et al. Comparing the Noncomparable: The Need for Equivalence Measures That Make Sense in Health-Economic Evaluations. Value Health 2019;22:684-92. [Crossref] [PubMed]
Lange O. Health economic evaluation of preventive digital public health interventions using decision-analytic modelling: a systematized review. BMC Health Serv Res 2023;23:268. [Crossref] [PubMed]
Whitehead SJ, Ali S. Health outcomes in economic evaluation: the QALY and utilities. Br Med Bull 2010;96:5-21. [Crossref] [PubMed]
Alolabi N, Chuback J, Grad S, et al. The utility of hand transplantation in hand amputee patients. J Hand Surg Am 2015;40:8-14. [Crossref] [PubMed]
McGregor M. Cost-utility analysis: use QALYs only with great caution. CMAJ 2003;168:433-4.
Yang F, Devlin N, Luo N. Cost-Utility Analysis Using EQ-5D-5L Data: Does How the Utilities Are Derived Matter? Value Health 2019;22:45-9. [Crossref] [PubMed]
Galante J, Augustovski F, Colantonio L, et al. Estimation and comparison of EQ-5D health states' utility weights for pneumococcal and human papillomavirus diseases in Argentina, Chile, and the United Kingdom. Value Health 2011;14:S60-4. [Crossref] [PubMed]
Xie X, Guo J, Bremner KE, et al. Review and estimation of disutility for joint health states of severe and nonsevere hypoglycemic events in diabetes. J Comp Eff Res 2021;10:961-74. [Crossref] [PubMed]
Peasgood T, Brennan A, Mansell P, et al. The Impact of Diabetes-Related Complications on Preference-Based Measures of Health-Related Quality of Life in Adults with Type I Diabetes. Med Decis Making 2016;36:1020-33. [Crossref] [PubMed]
Vexiau P, Mavros P, Krishnarajah G, et al. Hypoglycaemia in patients with type 2 diabetes treated with a combination of metformin and sulphonylurea therapy in France. Diabetes Obes Metab 2008;10:16-24. [Crossref] [PubMed]
Nord E. Beyond QALYs: Multi-criteria based estimation of maximum willingness to pay for health technologies. Eur J Health Econ 2018;19:267-75. [Crossref] [PubMed]
National Institute for Health and Care Excellence (NICE) in UK. Appendix M: Cost-effectiveness analysis in diagnosis of asthma in adults and young people aged over 16. In: Asthma: diagnosis and monitoring of asthma in adults, children and young people (NICE guideline NG80). The Institute (London (UK)). 2017. Available online: https://www.nice.org.uk/guidance/ng245/evidence/appendices-a-to-r-pdf-7079863937
Xie X, Wang M, Gajic-Veljanoski O, et al. Examining the correlation between treatment effects in clinical trials and economic modeling. Expert Rev Pharmacoecon Outcomes Res 2022;22:1071-8. [Crossref] [PubMed]
Edmonds M, Lázaro-Martínez JL, Alfayate-García JM, et al. Sucrose octasulfate dressing versus control dressing in patients with neuroischaemic diabetic foot ulcers (Explorer): an international, multicentre, double-blind, randomised, controlled trial. Lancet Diabetes Endocrinol 2018;6:186-96. [Crossref] [PubMed]
Lobmann R, Augustin M, Lawall H, et al. Cost-effectiveness of TLC-sucrose octasulfate versus control dressings in the treatment of diabetic foot ulcers. J Wound Care 2019;28:808-16. [Crossref] [PubMed]
Wen J, Jin X, Al Sayah F, et al. Economic Evaluation of Sucrose Octasulfate Dressing for Treatment of Diabetic Foot Ulcers in Patients with Type 2 Diabetes. Can J Diabetes 2022;46:126-33. [Crossref] [PubMed]
Matthews RAJ. Beyond 'significance': principles and practice of the Analysis of Credibility. R Soc Open Sci 2018;5:171047. [Crossref] [PubMed]
Ramsay C, Pickard R, Robertson C, et al. Systematic review and economic modelling of the relative clinical benefit and cost-effectiveness of laparoscopic surgery and robotic surgery for removal of the prostate in men with localised prostate cancer. Health Technol Assess 2012;16:1-313. [Crossref] [PubMed]
Robotic Surgical System for Radical Prostatectomy. A Health Technology Assessment. Ont Health Technol Assess Ser 2017;17:1-172.
Parackal A, Tarride JE, Xie F, et al. Economic evaluation of robot-assisted radical prostatectomy compared to open radical prostatectomy for prostate cancer treatment in Ontario, Canada. Can Urol Assoc J 2020;14:E350-7. [Crossref] [PubMed]
American Cancer Society. Thyroid Cancer Survival Rates, by Type and Stage. Available online: https://www.cancer.org/cancer/thyroid-cancer/detection-diagnosis-staging/survival-rates.html
Xie X, Guo J, Wang M, et al. Understanding characteristics of contemporary cancer survival for modelling the cost-effectiveness of new interventions. BMJ Oncol 2025;4:e000958. [Crossref] [PubMed]
Wang Z, Luo N, Wang P. A comparative analysis of EQ-5D-5L general population norms across 23 countries: Gender and age disparities. Pharmacoeconomics and Policy 2025;1:5-14.
Esnaola NF, Cantor SB, Sherman SI, et al. Optimal treatment strategy in patients with papillary thyroid cancer: a decision analysis. Surgery 2001;130:921-30. [Crossref] [PubMed]
Pilz MJ, Nolte S, Liegl G, et al. The European Organisation for Research and Treatment of Cancer Quality of Life Utility-Core 10 Dimensions: Development and Investigation of General Population Utility Norms for Canada, France, Germany, Italy, Poland, and the United Kingdom. Value Health 2023;26:760-7. [Crossref] [PubMed]
Borre ED, Kaalund K, Frisco N, et al. The Impact of Hearing Loss and Its Treatment on Health-Related Quality of Life Utility: a Systematic Review with Meta-analysis. J Gen Intern Med 2023;38:456-79. [Crossref] [PubMed]
Horsman J, Furlong W, Feeny D, et al. The Health Utilities Index (HUI): concepts, measurement properties and applications. Health Qual Life Outcomes 2003;1:54. [Crossref] [PubMed]
US Food and Drug Administration. Real-World Evidence. Available online: https://www.fda.gov/science-research/science-and-research-special-topics/real-world-evidence

doi: 10.21037/jhmhp-25-81
Cite this article as: Xie X, Schaink AK, Gajic‑Veljanoski O, Guliyeva K, Ungar WJ. Considerations for enhancing the credibility of health economic models: a review. J Hosp Manag Health Policy 2026;10:10.

Considerations for enhancing the credibility of health economic models: a review

Introduction

Table 1

Time horizon: long-term or short-term

Selecting an effectiveness measure for health economic modelling

Balancing the model complexity with data availability

Controlling for additional factors in economic modelling

Determining key clinical parameters for an economic model

Targeted (limited) literature review of clinical evidence

Consistency with pathophysiology and human biology

Selection of utility instruments

Use of real-world data (RWD)

Strengths and limitations

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share