关键词不能为空

当前您在: 主页 > 英语 >

XLSTAT使用说明

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-02 10:52
tags:

-

2021年2月2日发(作者:味同嚼蜡)



XL


STAT





Generalities




Installation


Running


XL


STAT -


XL


STAT Manager


XL


STAT direct access


General remarks


Dialog boxes


Numerical efficiency


Missing data / Numerical and categorical variables


Error messages


Shortcut keys


References


Help files




Statistical tools




6D Plots



ANOVA / ANCOVA


AxesZoomer



Categorical sorting



Clustering 1 & 2



Correlations / Principal Component Analysis (PCA)



Correspondence analysis



Crossed sorting / Flat sorting


DataFlagger


Descriptive statistics


Discretize data



Discriminant analysis



Easy labels (two clicks to add labels on a plot)


Extract a sample of rows from a dataset


Factor analysis



Fit data to density functions and test fit


Histograms


Kruskal-Wallis test



MicroMover


MinMax Search


Models for binary response data (Logit, Probit, ...)


MotriMax


Multidimensional Scaling (MDS)



Multiple correspondence analysis



Non- linear fitting (GenFit)


Odds ratio


Similar rows detection


Plot Transformer



Regression



Test for comparing two proportions


Tests for comparing two samples (Student, Wilcoxon, Fisher, ...)


Tests on contingency tables (Chi- square and Exact tests)


Transposition





Microsoft Excel? and Statistics




License conditions





Generalities




Installation




Required software : Excel ? 5.0 or above



Required


hardware


:


XL


STAT


is


compatible


with


all


PCs


that


can


run


Microsoft


Excel?


5.0.


However,


for


optimal performance, a 486DX33 PC with 8Mb RAM or a PowerPC-based Macintosh is recommended.



To install


XL


STAT you need to copy all the files from the


XL


STAT disk to the hard disk. You load


XL


STAT by


opening the file with the open command in the File menu of Excel, or better, you can use the add-


in manager in the tools menu. A double click on the file name or icon in the Windows Explorer of file manager


is also possible.



The professional version of


XL


STAT is exactly the same as the shareware one, the only difference being that it


is not time limited and that a free update service is provided if the author is contacted at fahmy@.



XL


STAT website is :






Running XLSTAT - XLSTAT Manager




To run XLSTAT, please load the file by opening it with Excel File|Open command or by using the


Tools|Add-in Manager menu.



Once


XL


STAT


is


opened,


the


available


commands


can


be


directly


accessed


from


the


menubar


or


from


the



XL


STAT Manager


XL


STAT toolbar.



Using the XLSTAT Manager to open the tools makes it possible to access them in the


XL


STAT toolbar, as they


are added to it after selection. To know what tool corresponds to what icon of the toolbar, leave the mouse cursor


on the icon, and read the



By


using


the



XL


STAT


Manager


to


select


tools,


you


can


also


save


a


configuration


as



Saving


a


configuration as


XL


STAT will load, it will


automatically open the tools you saved in the configuration. When you click on save the options corresponding to


the “O” button of the toolbar are also saved for the next sessions.




You


can


also


activate


the



XL


STAT


Manager


at


any


time,


using


the



XL


STAT


Manager


command


of


the


XL


STAT menu.



To edit the


XL


STAT standard ordering sheet, you can use the


XL


STAT


or open from your


XL


STAT directory. After you registered you can use the


your


personal


license


number,


given


by


the


author


or


the


distributor.


You


will


then


immediately


become


an


XL


STAT-Pro !




XL


STAT direct access




If you want to


open


and


close



XL


STAT directly at any time by just pressing a button, you only need to open


once in Excel the macro. It will add a blue color button in the standard toolbar of Excel, close to the



button.


You


can


move


or


delete


this


button


as


any


other


Excel


button


using


the





General remarks




In the


plots will be stored in standard Chartsheets. If you don't select this option, the plots will be stored on the same


sheet as the output tables, in the upper left corner of the sheet.


XL


STAT sometimes creates intermediate sheets


that are stored in the results workbooks. If you want you can view them by selecting the corresponding option.



Each time you load


XL


STAT, the reference option is set to A1 if it was on R1C1. This is necessary for


XL


STAT


to work properly. If you change that option while working, don’t forget to change it back before using


XL


STAT


again.



Each



XL


STAT


copy


is


protected


by


copyright


laws,


and


can


only


be


used


on


one


computer


at


a


time.


Every


single shareware release of


XL


STAT is time limited.



To obtain the latest version (updates are released frequently) please contact Thierry FAHMY at the following e-


mail addresses : fahmy@



Don't


hack


this


software,


and


more


importantly


don't


spread


illegal


copies.


Respect


other


people's


work,


or


shareware will disappear. Don’t forget that if you buy


XL


STAT you will get free updates for a year!



The Author disclaims any responsibility for damages or subsequent loss caused by using


XL


STAT.




Dialog boxes




Dialog boxes have been created to be as straight-forward and uniform as possible.



The


Excel sheet the cell which will be the upper left corner of the results tables.



When you select data, variables always need to be stored in columns.




or


column


labels


correspond


to


the


labels


or


names


of


the


explanatory


variables.


If


you


select


the


variable labels within the data range you must tick the corresponding option.




within the data range, then you must specify tick the corresponding option.



NB: The Outputs range does not need to be on the same sheet or on the same Workbook than the input table.




Numerical efficiency




XL


STAT is efficient for most methods, but it can be slow when there are a lot of variables and rows.


depend on your computer processor and on the RAM. This occurs because


XL


STAT is fully written in VBA and


Excel macros are not compiled. But the fact that


XL


STAT is only VBA makes it possible for you to use it on any


computer and any system.




Missing data - Numerical and categorical data




XL


STAT is able to detect missing data for the following tools : PCA / Disc. Analysis / Classification / CA / MCA


/ Histograms / Descriptive Statistics / Factor analysis.



Missing data must be coded with empty cells.



Categorical


variables


can


be


coded


by


any


symbol


except


free


cells


(the


only


way


to


code


missing


data).


Numerical data needs to be numbers.




Error messages




There are very few error messages and they are explicit enough so that you can easily understand what is wrong.



The most frequent message is :


specify


if


variable


or


observation


labels are


included,


or


because


you


forgot


to


select


the


upper


left


corner


of


output tables.



Sometimes it can also occur because there is some non-numerical data in a column that should correspond to a


numerical variable.




References




To create XLSTAT, many references have been consulted within the which :


?



a (1991), Probabilité


s, Analyse de donné


es et statistiques, Technip, Paris


?



one (1993), Biomé


trie , Masson, Paris



?



A. Agresti (1990), Categorical Data Analysis , Wiley Interscience






Shortcut Keys




Ctrl +M : activates the XLSTAT Manager




Help files




XL


STAT help file



On


PC


and


Macintosh


computers,


help


topics


can


be


accessed


from


either


the


XL


STAT


menu,


or


from


each


dialog box. On PCs the help file can also be accessed from the file manager.



For users who would like to have a printed version of the help file, they can print the Word6 version of the help


file




Ordering XLSTAT




To


know


the


how


to


order


XLSTAT,


please


visit


/


and


go


to


the


order


page,


or


open


which is distributed with XLSTAT modules. For any further information, please contact Thierry Fahmy


at fahmy@




6D Plots





This tool is one of the simplest ones of XLSTAT for the mathematical background it requires from the user, but it


helps creating very surprising charts that will impress of lot of the people who ignore XLSTAT. With this tool


you can represent up to 6 dimensions at a time (or even seven if you use the labels to display information), 4 of


which can be numerical and 2 categorical. This tool can easily be used to plot 2 dimensions and differentiating


the belonging to various groups (a third dimension), or four dimensions to distinguish the financial results over


two


years


for


various


geographies


and


products.


The


only


compulsory



entries


for


this


tool


are


the


first


two


numerical dimensions and the output location (it can be on an Excel spreadsheet or on a separate chart sheet. Or


you can simply use this tool to do two dimensional plots using other object formats than the


traditional


Excel


ones (instead of using the small circles to represent data you can use any image or object you want !).




Dialog box entries



Select Xs


: select in this entry box the data on the Excel sheet that correspond to the first dimension (numerical


data only). This entry is compulsory and is represented on the abscissa axis.



Select


Ys



:


select


in


this


entry


box


the


data


on


the


Excel


sheet


that


correspond


to


the


second


dimension


(numerical data only). This entry is compulsory and is represented on the ordinates axis.



Select 3


rd


D


: select in this box the data on the Excel sheet that correspond to the third dimension (optional). The


third dimension is represented by the


size


of the object.



Select 4


th


D


: select in this box the data on the Excel sheet that correspond to the fourth dimension (optional).


The fourth dimension is represented by the


size


of the object.



Select Groups 1


: select in this box the data on the Excel sheet that correspond to the fifth dimension. Whatever


the data type in this selection, XLSTAT will take them as categorical data. The fifth dimension is represented by


the


color


of the object.



Select Groups 2


: select in this box the data on the Excel sheet that correspond to the sixth dimension. Whatever


the data type in this selection, XLSTAT will take them as categorical data. You can select data in this zone even


there is no


type


of the object.



Data


labels


included



:


if


this


option


is


selected,


you


must


have


included


and


selected


the


data


labels


when


selecting the Xs data. The labels must be on the left side of the Xs.



Variable


labels


included



:


if


this


option


is


selected,


you


must


have


included


and


selected


a


name


for


each


dimension.



Outputs :


the output is the chart itself. However you can decide to put it on a special area of a spreadsheet (then


select a cell of your choice on this spreadsheet in the


(then select



Autocolor :


deselect this option if want to control the color and the shape of the objects that will be used instead


of the usual Excel formats (dots, circles, squares). A


default colors suggested by XLSTAT. A


XLSTAT suggests only four shapes : circles, squares, triangles and diamonds, any kind of shape or image can be


added.


With


Excel97


(and


later


versions)


you


can


take


a


full


advantage


of


the


shapes


and


textures


that


are


available.




Example



This example shows a simple 3 dimension plot where the first two dimensions are represented on the X and Y


axes,


and


the


third


dimension


is


represented


using


the


object


size.


More


examples


can


be


found


on


/


.






Table 1 : data on Excel sheet




Figure 1 : dialog box as it should filled in



To


produce


the


plot


below


the


autocolor



option


has


been


deselected.


Then


the


default


object


suggested


by


XLSTAT in the


axes and of the background have been manually changed by the user. The Micromover tool of XLSTAT has been


used to move all labels a little bit up, all at the same time.






Figure 2 : 3D representation of the data





AxesZoomer





A simple but time saving utility tool. Select a plot, then run this tool and you will be able to modify the min and


max of the axes until you are satisfied.





Categorical sorting




This


tool


makes


both


increasing


and


decreasing


sorting


(with


the


usual


alphabetical


conventions)


of


several


Categorical variables easy to achieve, with decreasing priorities from the first to the last variable. The sorting is


done first on the first variable then on the second variable, but within the first variable categories, and so on,


until the last variable.



Note : if you want to sort numerical data it is better to use the classical Excel? sorting tool.





Cluster analysis 1 & 2




Cluster analysis is a data analysis tool that allows you to classify into groups, a set of observations described by


numerical


variables.


You


can


specify


the


number


of


groups


you


want


to


create.


XL


STAT


offers


you


two


different clustering techniques. It is often a good idea to try both and compare the results.



NB: If you want to classify observations described by Categorical variables it is advised to use first the Multiple


Correspondence Analysis and then use the observations coordinates on factorial axes (in the last results table of


MCA) as numerical variables to finally classify the observations.



XLSTAT offers now the possibility to normalize and weight the variables, some very important options for the


two methods suggested here which are based on euclidean distances : the scale of a variable might have a major


effect


on


the


final


result.


One


can


cancel


the


scale


effect


by


normalizing


the


variables.


Weighting


variables


enables


you


to


give


more


or


less


importance


to


the


variables


depending


on


how


much


you


want


them


to


influence the final result of the clustering.



Cluster analysis 1




Classification using centro?


ds method (also know as k-means). It is a quick and powerful method , but it does


not give any help in finding the ideal number of clusters.



In the dialog box,


that it might give some different results each time (because the starting point is selected by random). Only the


best classification (in sense of inertia) is presented in the results tables.



Note : sum of Between-groups and Within-groups inertia is constant and is equal to Total inertia. To choose


between the different iterations this tool selects the one with the highest between/within rate.




Cluster analysis 2




Ascendant hierarchical cluster analysis using Ward's clustering technique. If there are over 85 entries, it is not


possible to see the dendrogram graphic.



If you want to use a dissimiliraty measure which is not the euclidean distance (Ward's method), you may give as


input to


XL


STAT the dissimilarity matrix instead of the raw data. You can also select a correlation matrix as the


input data in that case, the dissimilarities are automatically computed using the following formula : dissimilarity


(i,j) = SquareRoot [ 2 x (1-correlation(i,j))] which produces a sphere with radius 1.



Results :



If you choose option


groups and only prints the abscissa (on dendrogram) table and the knots description table, and then creates the


level-histogram


and


the


dendrogram.


If


there


are


more


than


85


entries


XLSTAT


stops


to


ask


you


how


many


groups you want to build. If you don’t know, put 1 and follow the advice that will be written below one of the


tables.



If you choose option


described above, but with two more tables containing the data classification results.



If you choose the


inertia criterion.





Correlations / Principal Component Analysis




Introduction



Correlation coefficients enable analyzing relations between numerical or categorical ordinal variables. It's a way


to


check


if


two


variables


evolve


in


the


same


way


or


opposite,


or


if


they


are


independent.


If


you


have


N


variables,


the


result


of


this


analysis


is


an


NxN


symmetric


table


with


correlation


coefficients


for


each


pair


of


variables.



The


correlation


coefficients


matrix


is


the


most


common


basis


for


starting


a


Principal


Component


Analysis,


which most important results are the building of N independent factors (they are linear combines of the initial


variables, which have the property of have null correlation coefficients), and an optimized representation of all


the data on a P-dimensional plot, where P is often 2 in practice, although it can be from 1 to 3. The quality of


the plot is given by the % of the N-dimensional variability is has been able to show.




Dialog box entries



Reference on sheet : select in this box the table that contains the data, with or without the columns and rows


labels (if necessary select the labels options). The data must be numerical variables. If you want to do


a


full


PCA, then this table must include rough data (several individuals - in rows - described by several variables - in


columns). But if all you have is a correlations or a covariance matrix you can analyze its structure and see the


correlations circle.



Data


must


be


numerical


data.


If


there


are


missing


data


in


your


dataset


(coded


with


free


cells),


you


can


ask


XL


STAT to remove the corresponding rows for all the calculations or only when the missing data are involved


in the calculations. If the


deleted. For the correlations you can choose if you want to keep the rows with missing values or not.



Output range : select the cell that will correspond to the upper left corner of the results printed by


XL


STAT.



Number of additional rows : if they are some data that you don't want to use for the computations but only to


plot them afterwards, you only need to include them at the bottom of the selected table and specify the number


of data that need to be used only at the end.



Number of additional variables : if they are some variables that you don't want to use for the computations but


only to plot them afterwards, you only need to include them at the right of the selected table and specify the


number of variables that need to be used only post- computations.



NB : the classical correlation coefficient is usually calculated for numerical continuous integer or real data. Be


very careful before using it with ordinal numerical data Kendall's and Spearman's rank correlation coefficients


are much better for ordinal data.




Results : Correlation coefficients / Covariances




If


you


want


to


use


the


covariance


matrix


for


the


PCA


and


to


print


the


covariance


matrix


within


the


results,


please select the


matrix, because it removes the scales effects (the scale of the values of the various variables is not taken into


account to make them easy to compare in trends).



This


allows


you


to


calculate


the


correlation


coefficients


matrices.


You


can


either


calculate


the


classical


parametric coefficients, or the Spearman's or Kendall's non-parametric rank correlation coefficients. Spearman's


and Kendall's coefficients can be useful if you want to do PCA with ordinal data.



If the PCA option of the dialog


box


isn't


activated,


you


will only


get


the


correlation


matrix


as


a


result.


Two


calculation techniques are possible. If you choose


containing missing values are deleted. If not they will be deleted only when the variable corresponding to the


missing value will be involved in calculations. If you think it is worth it, this will allow you to keep as much


information as possible.



If you don't want to keep the rows with missing values, then if


XL


STAT comes across missing data in a row, this


row will be ignored for all the calculations. If there are missing value that you haven't seen and if you require a


PCA, XLSTAT will automatically detect and remove the corresponding rows.



To make the reading of the correlation matrix easier, you can choose to highlight the


significant


correlations


which are determined using the test that compares the absolute values to 1/sqr(n-1), where n is the number of


variables.



NB


:


for


Kendall's


and


Spearman's


coefficients,


to


find


out


if


it


is


significantly


different


from


zero


for


a


significance level you choose, you need to look in the corresponding tables which can be found at the end of any


book on statistics.



To help you in reading the correlation matrix, you can select in the dialog box a range within which all values


will appear in a light green color. If you want to see all values then put nothing or two equal values.



Note


:


for


Kendall's


and


Spearman's


coefficients,


to


know


if


there


significantly


different


from


zero


for


a


significance level you choose, you need to look in the corresponding tables which can be found at the end of any


book talking about statistics.



Suggestion : you should use the


DataFlagger


tool of XLSTAT to make more visible some particular values on


large correlation coefficients tables.




Results : Principal component analysis (PCA)




After


the


calculation


of


the


correlation


matrix,


if


the


PCA


option


of


the


dialog


box


is


activated,


some


more


calculations will start, beginning the Principal Component Analysis, at the end of the which you will be able to


see the


you are projecting some n-dimensional data on a 2-dimensional


plot,


as


much


variability


as


possible


has


been


saved.



The



%


calculated


from


the


eigenvalues


gives


you


an


idea


of


the


global


variability


which


is


represented when using the axes of interest.



If you want to view some other two dimensions data representation, select other axis numbers in the dialog box or


change the series references on the chart. It's of course advised to always start with axis 1 and 2.



The


correlations


circle


is


a


2-dimensional


plot


helpful


for


interpreting


the


correlations


between


the


initial


variables and the new variables which are linear combines of the initial ones : the closer a variable is to an axis


and


to


the


circle,


the


higher


the


correlation


with


the


corresponding


factor


will


be.


It


is


also


a


way


to


plot


the


correlations


between


the


initial


variables


:


if


two


variables


are


in


opposite


quarters,


they


are


negatively


correlated


;


if


two


variables


are


perpendicular


the


correlation


is


close


to


0


;


if


two


variables


are


close,


the


correlation is close to 1.



A biplot is also plotted. It is not exactly a mix of the to previous plots, as the variables coordinates are modified


to take into account the representation power of each axis and the scale of the plot of the data.



To avoid any mistaken interpretation on the positions of the data in the new representation space, one can use the


squared


cosines


printed


in


the


last


table


:


the


closer


a


squared


cosine


is


to


one,


the


closer


the


data


is


to


the


corresponding axis.





Example



Here is a simple example with 8 observations being described by 6 variables. PCA is used here to visualize the


observations on a map to quickly see which years have been comparable and which not.






Table 1 : data on Excel sheet





Figure 1 : the dialog box as it must be filled in




The


first


result


displayed


by


XLSTAT


is


the


correlation


coefficients


matrix.


Here


the


DataFlagger


(see


in


the


XLSTAT - Excel Utilities folder) has been used to highlight the strong positive and negative correlations.




Table 1 : correlations matrix



Among the 3 plots displayed by XLSTAT, the biplot is the nicest one when there aren't too many observations


and variables (if not, it can be messy) as it shows the mapping of the data simultaneously with the mapping of the


initial of the variables on the factorial axes.






Figure 2 : biplot of variables and observations



For example, using this chart one can say that years 1965, 1966 and 1968 are close because production of citrus


fruits have been high and production of rice has been low.





Correspondence Analysis





Introduction



Correspondence


analysis


is


generally


useful


for


analyzing


survey


results


when


two


questions


have


been


asked


with several possible answers. It can of course be used for the analysis of any two-way contingency table. For


example it is possible with this tool to study the relation between parents jobs and studies of children. A plot is


generated to make the interpretation of the numerical results easier.




Dialog box



Contingency



table


:


select


in


this


box


the


table


that


contains


the


data,


with


or


without


the


columns


and


rows


labels (if necessary select the labels options). The data must be numbers. A contingency table is a table for which


a cell (i;j) corresponds to the frequency observed for simultaneous characteristics: category


category



Output range : select the cell that will correspond to the upper left corner of the results printed by XLSTAT.



Number of supplementary rows : default is zero. If you want that some rows are not used for the computations


but only projected on the representation space using the results obtained using the other data, then put a value N


bigger than 0. The N last rows you selected in the contingency table will be used as additional rows.



Number


of


supplementary


columns


:


default


is


zero.


If


you


want


that


some


columns


are


not


used


for


the


computations but only projected on the representation space using the results obtained using the other data, then


put a value N bigger than 0. The N last columns you selected in the contingency table will be used as additional


columns.



Maximum


number


of


axes


:


if


you


to


accelerate


the


outputs


generation


and


to


improve


the


readability


of


the


outputs you can ask to see only the results for the first N axes.



Confidence


range


for


the


Chi2


:


this


value


is


necessary


for


testing


if


one


can


say


rows


and


columns


are


independent.


If


they


are,


then


there


is


no


use


to


do


a


correspondence


analysis,


and


there


are


no


significant


relations to study.



Show


contributions


and


cosines


:


these


results


are


important


but


you


can


ask


not


to


see


them,


which


will


accelerate the generation of outputs.




Results



To make possible the graphical representation of the proximities between categories of the two variables many


calculations


are


necessary.


As


well


as


with


Principal


Component


Analysis,


it


leads


to


eigenvalues


and


eigenvectors


calculation.


Eigenvalues


can


be


interpreted


as


an


image


of


the


total


inertia


represented


on


the


corresponding axes.



A representation space is designed so that the


big as possible. A projection on the first two axes is plotted. This allows an interpretation of relations between


categories. For example, if two categories (MBA in famous business schools and lawyer) are close on the graphic


one can say that they are strongly linked.



Several


results


tables


allow


a


better


interpretation.


The


contributions


of


each


category


to


each


axis


make


it


possible


to


understand


their


importance


and


their


relations.


The


two


last


results


tables


evaluate


the


cosine


between


each


category


and


the


representation


plane.


So


one


can


avoid


wrong


interpretations


saying


two


categories are close because of the plot though they were far on the whole space before projection. The squared


cosines are called by some authors




Example



The source of the Table 1 is the American Social Attitudes Sourcebook (Converse et al., 1980), and it shows the


results of a survey where people of different ages were asked if civil rights people were pushing too fast.




Table 1 : data from American Social Attitudes Sourcebook



On Figure 1, the dialog box is represented as it must be filled in to perform a Correspondence Analysis on the


data of Table 1.






Figure 1 : the dialog box as it must be filled in



The first result which is directly produced from the table itself is the 3D plot (see Figure 2). It enables to see


some


strong


trends.


With


simple


and


small


tables


it


already


provides


some


interesting


information,


but


not


as


much as the factorial map (Figure 3).





Figure 2 : 3D plot of the table itself



On the factorial map, we are able to represent in 2 dimensions already 97% of the total inertia which is excellent.


This makes the map a good representation of the links between the categories of the two variables (here age and


opinion) hidden in the contingency table.




Figure 3 : Factorial map on axes 1 and 2




References




For further information on this method, please visit


/







Crossed sorting / Flat sorting




These techniques are mostly effective for survey analysis. The variables can be questions with several possible


answers, or any other kind of categorical variables.



Crossed sorting




If


q


questions


are


asked


to


n


people,


and


if


you


have


the


results


in


a


n


rows


and


q


columns


array,


then


this


technique


will


allow


you


to


transform


this


array


into


q(q-1)/2


arrays


where


the


rows


will


represent


all


the


possible answers to the Qi question, and the columns will represent all the possible answers to the Qj question.


This kind of array is often called



The


interest


of


this


tool


is


that


it


allows


you


to


know


how


many


people


answered


X


to


Qi


and


Y


to


Qj.


Furthermore the percentages are also written in the results array; this can help you for an interpretation.




Flat sorting




Instead of presenting the results in a contingency table, with this tool the data array is transformed into a single


tree


:


the


first


column


corresponds


to


the


sorted


different


categories


of


the


first


variable;


the


second


column


corresponds to the various categories of variable 2, sorted within each variable 1 categories and so on. Before


the last column is written the number of combinations of categories corresponding to the row. In the last the


column is written the corresponding percentage.



Note


:


the


result


table


will


look


different


for


each


different


permutation


of


the


initial


table.


The


hierarchy


is


respected during the transformation so if the variable which interests you the most is the i


variable, then swap


columns 1 and i in the initial table.





DataFlagger





This tool allows you to identify some Excel cells - within a selected range - that correspond to some criterion you


define.



The criterion can be a text, a value or an interval.



For highlighting the cells you can change the color, or select the bold or italic format,





Descriptive Statistics





This is probably the most complete tool you can find to analyze in a simple way data with Excel. And probably


the most simple to use on the market.



Data can be categorical or numerical and various types of plots are possible.



Analyzing numerical data




XL


STAT


gives


as


a


standard


output


a


series


of


indexes


that


help


understanding


the


structure


of


the


data.


Skewness and Kurtosis values are computed using the same formulae as Excel (see Excel help for details).



For the standard deviation, three formulae are possible when the data are weighted :


1) Sum of weighted squares of differences between data and weighted mean, divided by sum of weights minus 1.


2) Sum of weighted squares of differences between data and weighted mean, divided by sum of weights.


3) Sum of weighted squares of differences between data and weighted mean, divided by sum of weights minus


average value of weights.



For the median,


XL


STAT uses the same technique as most packages. For the 1st and 3rd quartile Excel formula is


used is the data are unweighted. For weighted data, another approach base on a linear interpolation is used.



X/Y and X/X plots are simple two dimensional plots.



X/Y and


Q/Q


plots


are


simple


two


dimensional


plots


when


X<>Y,


but


if


X=Y


a


Q-Q


plot


is


plotted


with


the


empirical cumulative distribution function of X on the abscissa, and the normal cumulative distribution function


on ordinates (with the same mean and std deviation as X). This is useful to compare the distribution of X to a


normal distribution.



Box plots can be plotted vertically or horizontally. The means are plotted in red color. Box plots give you an idea


of


the


distribution


of


your


data


especially


in


terms


of


symmetry


and


scale.


It


is


useful


when


comparing


the


structure of several variables (or groups if data are grouped) at the same time. The limits of the box correspond to


the 1


st


and the 3


rd


quartiles (Q1 et Q3), and the fences to respectively Q1-1.5(Q3-Q1) and Q3+1.5(Q3-Q1). Data


which are outside of these fences are considered as potential outliers and are printed on the box plot as empty


circles. The min and max are always printed with a black circle.



Scattergrams make possible to visualize all the data and their mean for all the selected variables. This tool can be


very useful to visually compare several populations in terms of distribution and density, without any parametric


consideration (type of distribution, parameters such as mean and variance).



Stem-Leaf


plots


are


mostly


used


with


integer


values.


XL


STAT


uses


an


expert


system


to


determine


the


ideal


structure


of


the


stem.


For


this


technique,


details


can


be


found


at


the


following


Web


page:


/~droyster/maed3103/Lesson_7_Section_




Analyzing categorical data




XLSTAT analyzes the structure of categorical data by counting (using weights if necessary) the frequencies for


each category of each variable. Two plots are available : histograms and pie charts.



If the option


easier the comparison of the groups and whole population structure.



References




Terry Sincich (1996), Business Statistics by Example, Prentice Hall.





Discretize data





Data discretization should be used when you want to transform numerical data to categorical data. To do so, the


numerical variable must be divided into several ranges. The result is an ordinal categorical variable because of


the natural order between the intervals.



Discretization is very useful in marketing studies where for


some


variables


the


numbers


are


not


interesting


by


themselves but only because they put people into different groups, where the groups have a marketing meaning.



Furthermore, if you want to map some data described by both numerical and categorical variables, it might be


interesting


to


discretize


the


numerical


data


and


then


to


use


multiple


correspondence


analysis


which


enables


a


mapping of categorical data and variables. As the results of discretization is categorical ordinal data, if the other


categorical variables of your dataset are also ordinal, it is better to use Principal Component Analysis based on


the rank correlation coefficients (Spearman's or Kendall's coefficients) in that case, categories must be coded


using numbers (1, 2, 3 ... for example) to indicate to XLSTAT the ranking among categories.



There are several ways to define the intervals limits before starting discretizing a variable. The two most common


techniques one are based on equal size ranges and on equal density classes (based on quantiles). XLSTAT does


not automatically create the limits for you, but it is easy to transmit your results to XLSTAT.



XLSTAT


offers


two


ways


to


define


the


intervals


once


the


limits


are


known


from


you.


You


can


use


the



might look :




Figure 1 : definition table




A definition table must contain 3 columns. The first column must contain the name of the category, the second


must contain the type of operator, and the third one the limit value. Up to the last category (not included) the


intervals must


be


defined


using


the



or



operators.


The


last


operator


must


be



if


the


latter


was



way that enables you to surely and quickly define any series of intervals you want, without excluding any data.



What is interesting in using a definition table is that you can save it, change it easily, use it later or regularly. But


if you want, XLSTAT allows you to enter manually the definitions in the discretization tool dialog box.





Figure 2 : data example



If the data you want to discretize using the definition table shown on Figure 1, are as on Figure 2, the dialog box


should be filled as presented on Figure 3.




Figure 3 : dialog box



After pressing OK, you will have the following results on






Figure 4 : results table



This example is based only on a few data, but it works very well and very quickly for thousands of data.






Discriminant analysis





The


categorical variable are stored. The variables do not necessarily need to be contiguous. You can use the


key


to


do


multiple


selection


with


Excel.


You


can


also


do


it


manually


by


adding


a



(or


a



if


this


is


your


standard list separator) between the ranges. Example : $$B$$2:$$B$$10,$$F$$2:$$F$$10 .



If


you


want


to


include


columns


labels,


these


must


be


selected


with


the


Ys


and


the


Xs


(first


row)


and


the


corresponding option must be ticked. If included, the data labels must be stored in the first column of the initial


dataset.



The


an Excel sheet the cell which will be the upper left corner of the results tables.



The categorical variable position in the dataset (column number) must be specified. Note that if the first column


of your dataset includes the observations labels you mustn’t take it into account when specifying the categorical


variable position.



Discriminant analysis allows you to test if the assumed membership of the observations to some a priori groups


(described


by


the


categorical


variable


containing


the


various


groups


names)


can


be


justified


using


some


explanatory


variables


(the


numerical


variables


of


the


initial


dataset).


The


explanatory


variables


need


to


be


numerical,


but


if


you


want


to


introduce


some


qualitative


variables


to


explain


the


membership


to


this


or


that


group, it is possible it you first do a Multiple Correspondence Analysis and then use the observations coordinates


(stored in one of the results tables of MCA - coordinates on one or two axes are usually enough) as quantitative


data.



Discriminant analysis can lead (if the option is selected) to an optimal (in sense of between-groups over within-


groups variability) graphical representation of the observations in a two dimensions space. XLSTAT discriminant


analysis makes possible to reclassify the observations that have been used for the analysis (


also (optional) to do a crossvalidation using some more observations stored at the end of the initial dataset. The


number of crossvalidation data needs to be specified in the dialog box. Note that for the resubstitution and the


crossvalidation,


no


prior


probabilities


are


set


by


XLSTAT.


It


is


not


necessary


to


define


a


priori


belonging


to


groups for the crossvalidation data.



The linear discriminant functions are often used to score people (e.g. by financial organizations): knowing some


numerical data about an individual, you can determine to which category he most probably belongs calculating


the scores given by each discriminant function.


-


-


-


-


-


-


-


-



本文更新与2021-02-02 10:52,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/599153.html

XLSTAT使用说明的相关文章