Siddhi enables users to perform linear regression on real time data streams. The **regress function** takes in a dependent event stream (Y), any number of independent event streams (X1, X2,...Xn) and returns all coefficients of the regression equation

The two implementations of regression could be distinguished as follows

**regress**: This allows you to specify the batch size (optional) that defines the number of events to be considered for the calculation of regression.**lengthTimeRegress**: This allows you to specify the time window and batch size (required). The number of events considered for the regression calculation can be restricted based on the time window and/or the batch size.

### Input parameters for regress function

The following table describes the input parameters available for the `regress`

function.

Parameter | Description | Required/Optional | Default Value |
---|---|---|---|

Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | `1` (i.e., for every event) |

Batch Size | The maximum number of events to be used for a regression calculation. | Optional | `1,000,000,000` |

Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |

Y Stream | The data stream of the dependent variable. | Required | |

X Stream(s) | The data stream(s) of the independent variable. | Required |

**Format**: `regress(Y, X1, X2,....,Xn)`

or `regress(calculation interval, batch size, confidence interval, Y, X1, X2,....,Xn)`

### Input parameters for lengthTimeRegress function

The following table describes the input parameters available for the `lengthTimeRegress`

function.

Parameter | Description | Required/Optional | Default Value |
---|---|---|---|

Time Window | The maximum time duration to be considered for the regression calculation. | Required | |

Batch Size | The maximum number of events to be used for a regression calculation. | Required | |

Calculation Interval | The frequency with which the regression calculation should be carried out. | Optional | 1 (for every event) |

Confidence Interval | The confidence interval to be used for a regression calculation. | Optional | 0.95 |

Y Stream | The data stream of the dependent variable. | Required | |

X Stream(s) | The data stream(s) of the independent variable. | Required |

**Format**: `lengthTimeRegress(time window, batch size, Y, X1, X2,....,Xn)`

or `lengthTimeRegress(time window, batch size, calculation interval, confidence interval, Y, X1, X2,....,Xn)`

.

### Output parameters

The following table describes the output parameters.

The same output parameters are available for each implementation.

Parameter | Name | Description |
---|---|---|

Standard Error | `stdError` | The standard error of the regression equation. |

β coefficients | `beta0` , `beta1` , `beta2` etc. | n+1 β coefficients where `n` is the number of `x` parameters. |

Input Stream Data | The name given in the input stream | All the attributes sent in the input stream. |

The `regress`

and `lengthTimeRegress`

functions nullify any β coefficients that fail the T-test based on the confidence interval. You can access any of the output parameters using its name (as given in the table above).

### Examples

#### Example 1

The following query submits a calculation interval (every 10 events), a batch size (100,000 events), a confidence interval (0.95), a dependent input stream (Y) and 3 independent input streams (X1, X2, X3) that are used to perform linear regression between Y and all the X streams.

from StockExchangeStream#timeseries:regress(10, 100000, 0.95, Y, X1, X2, X3) select * insert into StockForecaster

When this query is executed, it returns the standard error of the regression equation (ε), 4 β coefficients (β_{0}, β_{1}, β_{2}, β_{3}) and all the items available in the input stream. These results can be used to build a relationship between Y and all the Xs (regression equation) as follows.

#### Example 2

The following query submits a time window (200 milliseconds), a batch size (10,000 events), a calculation interval (every 2 events), a confidence interval (0.95), a dependent input stream (Y) and an independent input stream (X) that are used to perform linear regression between Y and all the X streams.

from StockExchangeStream#timeseries:lengthTimeRegress(200, 10000, 2, 0.95, Y, X) select * insert into StockForecaster

When this query is executed, it returns the standard error of the regression equation (ε), 2 β coefficients (β_{0}, β_{1}) and all the items available in the input stream.