WebCab Probability and Statistics for COM v3.3 Demo

Correlation Class

Stateless version of the Correlation and Regression class allows the investigation of linear relationships between two variables using techniques from correlation and linear regression as detailed within the PDF documentations.

For a list of all members of this type, see Correlation Members.

System.Object
   Correlation

public class Correlation

Remarks

The stateless version of Correlation requires that the data to sent to each business method rather than just being set once as is the case with the stateful version of Correlation.

Overview of the Correlation class

We study the relationship between two variables by considering a data set of pairs of values which correspond to particular instances of values taken simultaneously by the two underlying variables. We then study the correlation and linear regression properties of this data set in order to deduce information concerning the relationship between the to variables.

In particular, we allow the linear regression line to be constructed which allows us to predict one variable from given values of the other variables to a certain degree of confidence dependent on the `linearity' of the date set. We also cover linear (Pearson's, t-test, z-transform) and rank (Spearman's, Kendall's) correlation.

That is, by using this class for a given data set you are able to decide to what degree two variables are correlated, determine the confidence interval and the level of significance of the correlation tests performed. You are also able to construct the regression line for the data set. Similarly, you can determine for two data samples with corresponds regression lines the confidence interval for the conditional mean between these two regression lines.

Possible Data Sets, questions addressed and effectiveness

Possible Data Sets

Such data sets appear in a number of contexts. Examples of pairs for which such data sets could be constructed include:

  1. The grade and the number of students with a class whole obtained that grade.
  2. The number of commercials shown and the sales achieved in a given week.

Possible Questions addressed

By tabulating a given set of students or sales data; respectively against the above criteria, the application of this class would address the following type of questions:

  1. (Grades, Students): The average grade obtained, the degree of the dispersion of the grades, generally does the number of students obtaining a grade increase as the grade increases. To what degree (using linear methods) can we predict the number of students which will obtain a given grade.
  2. (No. Commercials, Sales): The average sales or number of commercials in a given week along with the dispersion (or variance) between these values from week to week. Establish to what degree to increase in the number of commercials increases the sales figures.

Effectiveness

The effectiveness of the functionality in terms of being able to predict values will depend on the nature of the data set considered. The reason being that we will only be able to confidently makes predictions when there exist a strong linear relationship between the two variables considered. The reason being that we have implemented a linear regression model (see note below for more details).

The correlation functionality implemented consists of a number of coefficients which are designed to measuring the correlation (i.e. the degree to which one variable moves with the other) for differing types of sets (see notes below).

Passing the Date Set

With each of the methods of this class you will either need to provide via parameters the data set itself you wish to analysis (or data derived) from the data set.

Generally speaking we will pass a set of pairs of points: (xValues[0], yValues[0]), (xValues[1], yValues[1]),... , (xValues[n], yValues[n]).

Detailed Overview of the Functionality Available

Correlation Coefficients and Statistics

  1. PearsonCorrelationCoefficient - Evaluates Pearson's Correlation Coefficient.
  2. SpearmanRankTest - Spearman's Rank Correlation Coefficient.
  3. KendallCorrelationCoefficient - Evaluates Kendall's Correlation Coefficient.
  4. Significance - Calculates the significance test for a given correlation coefficient.
  5. Mean - Evaluates the arithmetic mean of the doubles passed as a parameter.
  6. SampleVariance - Evaluates the variance of the doubles passed as a parameter.

Linear Regression methods

  1. LeastSquaresRegressionLineY - Constructs the regression line of Y on X using the method of least squares.
  2. LeastSquaresRegressionLineX - Constructs the regression line of X on Y using the method of least squares.
  3. CoefficientOfDetermination - Calculates the coefficient of determination for the current set of data.
  4. Residuals - Determines the residual for a given pair of points.
  5. ResidualsAverage - Determines the arithmetic average of all the residuals.

Requirements

Namespace: WebCab.COM.Statistics.Correlation

Assembly: WebCab.COM.StatisticsDemo (in WebCab.COM.StatisticsDemo.dll)

See Also

Correlation Members | WebCab.COM.Statistics.Correlation Namespace