WebCab Probability and Statistics Web Services for .NET v3.3 Demo

CorrelationStateful Class

This is the stateful implementation of the Correlation and Regression XML Web service allowing the investigation of linear relationships between two variables using the techniques of correlation and linear regression.

For a list of all members of this type, see CorrelationStateful Members.

System.Object
   CorrelationStateful

public class CorrelationStateful

Remarks

This version of the Correlation and Regression functionality allows the data set of pairs which is being studied to be set and then for various qualitative properties of this set of pairs to be evaluated, This approach is particularly appropriate in instances where repeated evaluation of the various Correlation properties will be performed on the same data set. The reason being that for each of these evaluation the data set

Use of State

This stateful version implements the functionality of the Correlation and Regression XML Web service using the OOP notion and technique of state. In instances, where the which will allow for more efficient execution in instances when the data is "sent over the wire" (for example in instances when the data set of retrieved from a remote DBMS).

Overview of this Correlation XML Web service

We study the relationship between two variables by considering a data set of pairs of values which correspond to particular instances of values taken simultaneously by the two underlying variables. We then study the correlation and linear regression properties of this data set in order to deduce information concerning the relationship between the to variables.

In particular, we allow the linear regression line to be constructed which allows us to predict one variable from given values of the other variables to a certain degree of confidence dependent on the `linearity' of the date set. We also cover linear (Pearson's, t-test, z-transform) and rank (Spearman's, Kendall's) correlation.

That is, by using this XML Web service for a given data set you are able to decide to what degree two variables are correlated, determine the confidence interval and the level of significance of the correlation tests performed. You are also able to construct the regression line for the data set. Similarly, you can determine for two data samples with corresponds regression lines the confidence interval for the conditional mean between these two regression lines.

Possible Data Sets, questions addressed and effectiveness

Possible Data Sets

Such data sets appear in a number of contexts. Examples of pairs for which such data sets could be constructed include:

  1. The grade and the number of students with a class whole obtained that grade.
  2. The number of commercials shown and the sales achieved in a given week.

Possible Questions addressed

By tabulating a given set of students or sales data; respectively against the above criteria, the application of this XML Web service would address the following type of questions:

  1. (Grades, Students): The average grade obtained, the degree of the dispersion of the grades, generally does the number of students obtaining a grade increase as the grade increases. To what degree (using linear methods) can we predict the number of students which will obtain a given grade.
  2. (No. Commercials, Sales): The average sales or number of commercials in a given week along with the dispersion (or variance) between these values from week to week. Establish to what degree to increase in the number of commercials increases the sales figures.

Effectiveness

The effectiveness of the functionality in terms of being able to predict values will depend on the nature of the data set considered. The reason being that we will only be able to confidently makes predictions when there exist a strong linear relationship between the two variables considered. The reason being that we have implemented a linear regression model (see note below for more details).

The correlation functionality implemented consists of a number of coefficients which are designed to measuring the correlation (i.e. the degree to which one variable moves with the other) for differing types of sets (see notes below).

Setting the Date Set

Before the regression and correlation based methods are applied to study a given data set. The data set will be to be set by using one of the following approaches:

  1. AddValue - Construct the data set by adding one pair of values at a time.
  2. AddValues - Construct the data set by adding a number of pairs of values at once.

Remark: It your clients uses threads then it should be pointed out that the procedure using AddValue, or AddValues; is not thread safe.

Detailed Overview of the Functionality Available

Set the Data set and number of significant digits returned

  1. AddValue - Add pairs of values to the data set one at a time.
  2. AddValues - Add pairs of values to the data set many at a time.
  3. SetSignificantDigits - Set the number of significant digits which will be returned by this XML Web service's methods.

Correlation Coefficients and Statistics

  1. PearsonCorrelationCoefficient() - Evaluates Pearson's Correlation Coefficient.
  2. SpearmanRankTest() - Spearman's Rank Correlation Coefficient.
  3. KendallCorrelationCoefficient() - Evaluates Kendall's Correlation Coefficient.
  4. Significance - Calculates the significance test for a given correlation coefficient.
  5. MeanX - Mean of the values of the first elements of the pairs from which the current data set is constructed.
  6. MeanY - Mean of the values of the second elements of the pairs from which the current data set is constructed.
  7. SampleVarianceX - The variance of the first elements from the pairs from which the current data set is constructed.
  8. SampleVarianceY - The variance of the second elements from the pairs from which the current data set is constructed.

Linear Regression methods

  1. LeastSquaresRegressionLineY - Constructs the regression line of Y on X using the method of least squares.
  2. LeastSquaresRegressionLineX - Constructs the regression line of X on Y using the method of least squares.
  3. CoefficientOfDetermination - Calculates the coefficient of determination for the current set of data.
  4. Residuals - Determines the residual for a given pair of points.
  5. ResidualsAverage - Determines the arithmetic average of all the residuals.

Requirements

Namespace: Correlation

Assembly: WebCab.StatisticsDemo (in WebCab.StatisticsDemo.dll)

See Also

CorrelationStateful Members | Correlation Namespace