Example workflow

CSV regression model example with MLdeck

This illustrative workflow shows how a CSV regression model can be explored in MLdeck. Regression predicts numeric values rather than categories. A user might predict price, sales amount, delivery time, monthly revenue, or energy consumption. The workflow emphasizes numeric target selection, feature review, mean baseline comparison, leakage checks, and exportable artifacts for validation and deployment testing.

What CSV regression means

CSV regression is a tabular machine learning task where the target is a number. Instead of predicting a class like churned or not churned, the model estimates a continuous or count-like value. Example targets include price, sales_amount, delivery_time_days, monthly_revenue, and energy_consumption.

Regression is useful for forecasting, estimation, planning, and prioritization, but it can be easy to misread. A model may perform well on historical rows while failing when the distribution changes. MLdeck's browser-local workflow is a place to explore whether a CSV has learnable numeric signal before deeper validation.

Example regression dataset structure

A price prediction CSV might contain product category, brand, age, condition, region, historical demand, number of views, and the target price. A delivery-time dataset might contain distance, carrier, warehouse, item count, region, order weekday, and the target delivery_time_days. A revenue dataset might include plan type, account age, active seats, usage counts, support volume, and target monthly_revenue.

The row definition matters. A row may represent an order, a customer-month, a property, a device-day, or a product listing. If rows mix different meanings, the model may learn unstable patterns.

Choosing a numeric target column

In MLdeck, choose a target column that is numeric and known as the outcome. Avoid target columns that are IDs, categories, status labels, or formatted strings. If the target contains currency symbols, commas, or units, clean it before training. If the numeric target has only a few repeated values, consider whether the task is actually classification or ordinal prediction rather than regression.

Temporal meaning matters. If you predict monthly revenue, do not include future revenue columns as features. If you predict delivery time, do not include fields recorded after delivery. The target must represent what you want to estimate, and features must represent what is available at prediction time.

Feature selection and preprocessing

Regression workflows often mix numeric and categorical features. Numeric columns may need missing-value handling, scaling, or clipping. Categorical columns such as region, plan type, carrier, product category, or payment method may need encoding. High-cardinality IDs usually should be excluded unless there is a carefully validated reason to keep them.

MLdeck helps users review columns and train candidate models in the browser. During normal browser training flows, raw CSV training data is not uploaded to a cloud training service. That makes the workflow useful for local evaluation and privacy-sensitive prototyping.

Mean baseline and regression metrics

The mean baseline predicts the average target value for every row. A useful regression model should improve on that baseline. MLdeck can help compare model candidates to simple references. Important regression metrics include MAE, RMSE, R squared, and delta versus the mean baseline.

MAE is often easier to explain because it measures average absolute error in target units. RMSE penalizes large errors more strongly. R squared describes variance explained, but it can be misleading when the target distribution is narrow or the dataset is biased. No single metric is enough for every use case.

Reading leaderboard results

A leaderboard can identify promising candidates, but it should not be treated as final evidence. Check whether the best model improves meaningfully over the mean baseline. Inspect whether errors are acceptable in business terms. A delivery model with an average error of one day may be acceptable for some planning workflows and unacceptable for same-day logistics.

Look for suspiciously strong results. If a model appears extremely accurate, investigate leakage, duplicate rows, target-derived totals, or a split that lets future information into training. Regression models can be fooled by post-outcome fields just like classification models.

Common regression leakage risks

Leakage examples include future revenue columns, post-sale fields, refund totals, final invoice amounts, target-derived ratios, or IDs that encode ordering. A column named total_after_discount may be invalid if the target is original price and the total was calculated later. A delivery dataset may include delivered timestamp fields that reveal the target.

If the data is temporal, use time-aware validation before making decisions. Random splits can overstate performance when rows from the future resemble rows in the past too closely or when the same entity appears repeatedly.

Exporting regression artifacts

After training, MLdeck can export artifacts such as ONNX, Docker packages, Python files, and PDF reports. These exports are intended for validation and deployment testing. For regression, test representative rows, low and high target ranges, missing values, and category edge cases. The exported ONNX artifact is designed for portable ONNX Runtime inference, subject to parity validation.

Limits before using predictions in decisions

MLdeck is an MVP and early beta. Regression outputs may influence pricing, operations, staffing, forecasting, or customer treatment. Strict validation should be used before relying on predictions for important decisions. Review data quality, temporal drift, fairness, business constraints, and the cost of prediction errors.

CSV regression AutoML FAQ

Can MLdeck train regression models from CSV?

Yes, when the target is numeric and the data is tabular.

What target column should I choose for regression?

Choose a numeric outcome such as price, sales amount, delivery time, monthly revenue, or energy consumption.

What does the mean baseline mean?

It predicts the average target for every row. A useful model should beat it in a meaningful way.

Which regression metric should I trust?

Use several metrics and interpret them in business units. MAE is often easiest to explain.

Can I export a regression model from MLdeck?

Yes, for validation and deployment testing after parity checks.

Related examples and guides

Continue with classification, data quality, and ONNX export workflows.