Clarofy Features: Regression

    Create a model in Clarofy


    The Modelling and Prediction tool in Clarofy allows us to create a model to fit our data. If the fit is good, it might help us understand what decisions we can make to improve our KPI.

    There are 3 options for type of model: Regression, XGBoost and support vector machine.

    • Regression: Linear regression is a data-scientist's vanilla ice cream. It creates a model with easily interpreted outputs (model coefficients) and allows you to validate whether the model assumptions are reasonable. If the model residuals (difference between observed values and values predicted by the model) are normally distributed, then linear regression might be suitable for your data.
    • XGBoost: If the process you are modelling has interactions between variables that you aren't easily able to include in your own data set, then XGBoost (a type of decision tree model) may be able to detect this behaviour 'by itself'. It also allows for the inclusion of missing data, so you may retain more of the information in your data set without having to recode it manually and make a poor assumption about what to put in its place.
    • Support Vector Machine: Support Vector Regression (SVR) can allow for modelling non-linear relationships between variables by the use of a non-linear kernel function, but it can't handle missing data. It might be better suited for extrapolating predictions than XGBoost, but it will likely perform best when using smaller data sets with less features.
    Model1


    Select the variables that you want to adjust for in the model – this means that you want to see how they confound or modify the effect of your variable of interest on the KPI. If you investigated correlation earlier, you might have an idea of some predictors (independent variables) that might have a really high correlation with one-another; you may only want to select one of the pair/triplet/quartet/etc to avoid multi-collinearity issues.

    In output options, check the boxes that you want to see for each predictor.

    • Coefficients will only be shown for Linear regression models, as they do not exist for the other models.
    • The Residuals plot shows the R2 (error) considerations: a plot of modeled vs actual values and the error metrics.
    • The response curves shows the newly modeled variables against your KPI.
    model2

    If you check the 'Fit Quadratic Terms' box, you will be able to see a quadratic equation for your curve. This regression model can then be exported by copying the model formula in the 'Model report' box.

    Currently XGBoost and Support Vector Machine models cannot be exported outside Clarofy, so keep an eye out for that capability in an upcoming release!

    Let us know if this article was helpful and what else you'd like to see in ClaroMente.

     

    We are always adding more features to Clarofy, so if there's a specific evaluation (or other!) tool to would like to see included, let us know.

    Feature

    Validation

    Upload and validate your dataset

    Feature

    Activity Log

    Keep track of your workflow

    Feature

    Correlation

    Find out which variables are affecting others.

    Feature

    Heatmaps

    What are they can how can they help you