Dataset Information

The C1C2: a framework for simultaneous model selection and assessment.

ABSTRACT:

Background

There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. Since the number of conceivable models in general is vast, it was also of interest to investigate the employment of two automatic search methods, a genetic algorithm and a brute-force method, for model choice. As a demonstration, the C1C2 was applied to simulated and real-world datasets. A penalized linear model was assumed to reasonably approximate the true relation between the dependent and independent variables, thus reducing the model choice problem to a matter of variable selection and choice of penalizing parameter. We also studied the impact of assuming prior knowledge about the number of relevant variables on model choice and generalization error estimates. The results obtained with the C1C2 were compared to those obtained by employing repeated K-fold cross-validation for choosing and assessing a model.

Results

The C1C2 framework performed well at finding the true model in terms of choosing the correct variable subset and producing reasonable choices for the penalizing parameter, even in situations when the independent variables were highly correlated and when the number of observations was less than the number of variables. The C1C2 framework was also found to give accurate estimates of the generalization error. Prior information about the number of important independent variables improved the variable subset choice but reduced the accuracy of generalization error estimates. Using the genetic algorithm worsened the model choice but not the generalization error estimates, compared to using the brute-force method. The results obtained with repeated K-fold cross-validation were similar to those produced by the C1C2 in terms of model choice, however a lower accuracy of the generalization error estimates was observed.

Conclusion

The C1C2 framework was demonstrated to work well for finding the true model within a penalized linear model class and accurately assess its generalization error, even for datasets with many highly correlated independent variables, a low observation-to-variable ratio, and model assumption deviations. A complete separation of the model choice and the model assessment in terms of data used for each task improves the estimates of the generalization error.

SUBMITTER: Eklund M

PROVIDER: S-EPMC2556350 | biostudies-literature | 2008 Sep

REPOSITORIES: biostudies-literature

ACCESS DATA

Publications

The C1C2: a framework for simultaneous model selection and assessment.

Eklund Martin M Spjuth Ola O Wikberg Jarl Es JE

BMC bioinformatics 20080902

<h4>Background</h4>There has been recent concern regarding the inability of predictive modeling approaches to generalize to new data. Some of the problems can be attributed to improper methods for model selection and assessment. Here, we have addressed this issue by introducing a novel and general framework, the C1C2, for simultaneous model selection and assessment. The framework relies on a partitioning of the data in order to separate model choice from model assessment in terms of used data. S ...[more]

PMID: 18761753

Similar Datasets

Project description:Kinetic parameters describing hepatic uptake in hepatocytes are frequently estimated without appropriate incorporation of bidirectional passive diffusion, intracellular binding, and metabolism. A mechanistic two-compartment model was developed to describe all of the processes occurring during the in vitro uptake experiments performed in freshly isolated rat hepatocytes plated for 2 h. Uptake of rosuvastatin, pravastatin, pitavastatin, valsartan, bosentan, telmisartan, and repaglinide was investigated over a 0.1 to 300 ?M concentration range at 37°C for 2 or 45-90 min; nonspecific binding was taken into account. All concentration-time points were analyzed simultaneously by using a mechanistic two-compartment model describing uptake kinetics [unbound affinity constant (K(m,u)), maximum uptake rate (V(max)), unbound active uptake clearance (CL(active,u))], passive diffusion [unbound passive diffusion clearance (P(diff,u))], and intracellular binding [intracellular unbound fraction (fu(cell))]. When required (telmisartan and repaglinide), the model was extended to account for the metabolism [unbound metabolic clearance (CL(met,u))]. The CL(active,u) ranged 8-fold, reflecting a 11-fold range in uptake K(m,u), with telmisartan and valsartan showing the highest affinity for uptake transporters (K(m,u) <10 ?M). Both P(diff,u) and fu(cell) span over two orders of magnitude and reflected the lipophilicity of the drugs in the dataset. An extended incubation time allowed steady state to be reached between media and intracellular compartment concentrations and reduced the error in certain parameter estimates observed with shorter incubation times. Active transport accounted for >70% of total uptake for all drugs investigated and was 4- and 112-fold greater than CL(met,u) for telmisartan and repaglinide, respectively. Modeling of uptake kinetics in conjunction with metabolism improved the precision of the uptake parameter estimates for repaglinide and telmisartan. Recommendations are made for uptake experimental design and modeling strategies.

Project description:The aim of this paper is to explore a new framework for personality assessment that may function as sanity nosology of personality traits: the Positive Personality Model (PPM). The recent publication of DSM-5 created the opportunity to assess personality traits as dimensional constructs (American Psychiatric Association, 2013). In Section III, five maladaptive personality traits are proposed as the maladaptive versions of Five Factor Model (FFM) traits (Costa and McCrae, 1985). This approach draws on the existing idea of conceptualizing pathological and typical personality traits as part of a continuum. It places DSM-5's maladaptive traits in a sickness pole and FFM's traits in a "typical" pole. This spectrum, however, does not include a positive perspective that represents healthy behavior: a sanity nosology. The Positive Traits Inventory-5 (PTI-5; de la Iglesia and Castro Solano, 2018) is a measure designed to assess the positive reverse of the Personality Inventory for DSM-5-Adult (PID-5; Krueger et al., 2013). The 220 positive personality criteria were studied psychometrically using a sample of 1902 Argentinean adults from the general population (M age = 39.10, SD = 13.81, Min = 18, and Max = 83; 50.1% females, 49.9% males). Exploratory and confirmatory factor analyses resulted in a five-factor solution. The dimensions were labeled Sprightliness, Integrity, Serenity, Moderation, and Humanity and subsumed under the denomination of PPM. Analyses of convergent validity provided some grounds for interpreting the five positive traits as positive versions of the pathological traits and the typical traits. When tested for its predictive capability on mental health, the PPM outperformed the variance explained by the FFM. It is concluded that the PPM may constitute a positive pole in the continuum of personality traits -possibly functioning as a sanity nosology- and that it is somewhat more related to optimal functioning than typical trait models. The PPM should be confirmed in other populations, its predictive capability ought to be tested with other relevant variables, and longitudinal studies should be done to analyze the stability of the traits over time.

Dataset Information

The C1C2: a framework for simultaneous model selection and assessment.

Background

Results

Conclusion

Publications

The C1C2: a framework for simultaneous model selection and assessment.

Similar Datasets

OmicsDI is part of the ELIXIR infrastructure

Tweets