# Multi Variate Analysis

Multi Variate Analysis is a set of statistical techniques that have been developed keeping in mind data sets with more than one variable. There are generally two types of techniques. The first kind analyses dependence; one or more variables are dependent variables that have to be predicted or explained by other variables. The second kind analyses interdependence. This is where no variable is regarded as being dependent. These techniques examine the relationship between variables, objects and cases.

There are many instances in the real world where multi variate techniques come in useful. For instance, census data is collected in countries all over the world. It is common for statisticians to study this data in order to derive patterns and arrive at important conclusions. This analysis often involves the interplay of many factors. For instance, the education level of earning members of the family has an impact on the income level of the household. Another example is that of analysing a production process where various factors like strength of the product, quantity, input and output of various sets of raw materials are measured. The need for creating computer models and analysing real world situations has enhanced the importance of this type of statistical analysis.

Some of the important multi variate analytical methods are Principal Component Analysis, Cluster Analysis and Discriminant Analysis. Principal Component Analysis is used to compress the information contained in a large number of variables to a smaller set of composite variables, while ensuring that there is no loss of information. The inferences made by this analytical method often reveal valuable information about the overall data set. Relationships that could not be spotted by a casual examination of the sample population are often revealed by using PCA. This method is commonly used as an intermediate function when large quantities of data are being evaluated since it helps to make the volumes manageable.

Cluster Analysis also plays a similar role in compressing large volumes of heterogeneous data into more manageable sets. The K-means clustering does this by classifying observations and minimizing the distance between the data and the related cluster centroid. Discriminate Analysis is a multi variate technique used to classify animal species in biology, different types of ailments in medicine and different kinds of risk in the insurance industry. It identifies groups and patterns in a large heterogeneous group and labels them making the further study of the data considerably easier and faster.