关键词不能为空

当前您在: 主页 > 英语 >

统计学外文翻译

作者:高考题库网
来源:https://www.bjmy2z.cn/gaokao
2021-02-06 11:31
tags:

-

2021年2月6日发(作者:weeks)



外文翻译原文







名称:


Fundamentals_of_Statistics






















Measures of Central Tendency and Location:


mean, median, mode, percentiles, quartiles and deciles.





x







sorted x


53



53


55



53


70



53


58



55


64



57


57



57


53



58


69



64


57



68


68



69


53



70


The Measures of Central Tendency are Mean, Median and Mode


Mean


?



x-bar



or





x



?


for


a


given


variable,


it


is


the


sum


of


the


values


divided


by


the


number


of


values (


?


x


i


/< /p>


n


).



In this case, we have


n


= 11.



So we need to add all of the values together and


divide by 11.



?


= 657,







x


= 59.73


Median



?



the number in a distribution of a variable’s response where one half of the values are above and


one half of the values are below.





To find the median, we first need to put our data in ascending


order


(smallest


to


largest).



Then


we


can


determine


the


median…if


the


value


of


n



is


odd,


it


is




simply the middle observation, but if the value of


n


is even, it is the average of the two middle


observations.



In this case,


n


is odd, so the median will be the middle observation of our sorted values (the 6


th


value)...57




Mode



?



the


value


that


occurs


most


frequently.



If


there


are


two


different


values


most


frequently


occurring, the data are said to be bi- modal.



If there are more than two modes, and the distribution is


said to be multi-modal.



In this case, the value that occurs most often is 53.



So, the mode is 53.



The measures of location are Percentile, Quartile and Decile



Percentile


?


the p


th


percentile is a value such that


at least p


percent of the observations are less than


or equal to this value


and



at least (100



p)


percent of the observations are greater than or


equal to this value.



To calculate percentiles, we use indices (


i


).





i =



(p/100) n





for p


1


, p


2


, p


3


,…p


99




If the answer is a whole number (an integer), then


i


is the average of


(P/100)n


and


1


+ (P/100)n


.




If the index number is not a whole number, we ALWAYS round up.



The position


of the index is the next whole number (integer) greater than the computed index.





For example:



i


(p50)



= (50/100)11 = 5.5...this rounds up to 6


So, we would count from the lowest value of the sorted data to the index number (6).



Since the calculated


i


was not a whole number we had to round up to find the value


where at least 50% of the values are equal to or lower than this value and at least 50%


are equal to or higher than this value.




In this case, the value of the 50


th


percentile


is the 6


th


value..


.57 … Does this look familiar?


?


The 50


th


percentile is the same


thing as the median.



What does it tell us?



In this distribution, AT LEAST 50% of the observations are


LESS THAN OR EQUAL TO 57 AND AT LEAST 50% of the observations are


GREATER THAN OR EQUAL TO 57.




i


(p80)



= (80/100)11 = 8.8...this round up to 9.



The 9


th


value is 68.


Again, since the index number is not a whole number, we round up. So, we would count from the


lowest value of the sorted data to the index number (9).



In this case, the value of the


80


th


percentile is 68.


Since this dataset has 11 observations, we won’t have any instances where our calculated index


number is a whole number.



However, if we just remove our value of 70 and create a


new distribution, we will be able to see an example...



53 53 53 55 57 57 58 64 68 69



i


(p30)



= (30/100)10 = 3...this is a whole number, so we must take the 3


rd


and 4


th


values and


average them to find the 30


th


percentile.



(53 + 55)/2 = 54



So, the value of the 30


th


percentile is 54.



Return to our original data distribution ...



Quartiles




are special cases of percentiles…Q


1


= P


25


, Q


2


= P


50


, Q


3


= P


75


,



These three values divide the distribution into 4 equal quarters




i


(Q1)



= (25/100)11 =



2.75...this rounds to 3, so Q1 is the 3


rd


value...53





i


(Q2)



= (50/100)11 = 5.5...this round to 6, so Q2 is the 6


th


value...57






i


(Q3)



= (75/100)11 = 8.25...this rounds to 9, so Q3 is the 9


th


value...64





Measures of Dispersion or V


ariability


:


Range, interquartile range (IQR), variance, standard deviation


and coefficient of variation.




Range


= This tells us how wide the span is from the maximum value to the minimum value.



(Max




Min) = Range.



In this instance, the range is 69 - 53 = 16.




Interquartile Range (IQR)


= This tells us how wide the span is in the middle 50% of the data.



(Q3



Q1) = IQR.



In this case ... 64



53 = 11



We will use IQR in later processes, so we will want to keep this



x



(x-xbar)


(x-xbar)


2



53


-6.73


45.29


53


-6.73


45.29


53


-6.73


45.29


55


-4.73


22.37


57


-2.73


7.45


57


-2.73


7.45


58


-1.73


2.99


64


68


69


70


657


657/11=59.73



4.27


8.27


9.27


10.27


-0.03


18.23


68.39


85.93


105.47


454.18


454.18/10



45.2


?


(


x


?


x


)


We use the formula:



n


?


1


2



=


s


2





The variance for these data is 454.18.



For our purposes here, the computation of variance


is just a step towards the computation of the standard deviation.




Sample standard deviation (


s


)


is the positive square root of the variance.





?


45< /p>


.


42


?


6


.


74






= s



So the formula for sample standard deviation is…




?< /p>


(


x


?


x


)


n


?


1

< p>
2




Population


Variance

< br>(


?


2


)


?


uses


the


same


formula


in


the


numerator,


but


N



instead


of


n-1



in


the


denominator.



Since


we


rarely


have


information


about


the


entire


population,


we


almost


always use the formula for sample variance,


s


2


.




Population


Standard


Deviation:



?



=


?


2


…since


we


rarely


have


information


from


the


entire


population, we use the formula for sample standard deviation,


s


.




Coefficient of Variation:







?


?


100



tells us what percent the sample standard deviation is of


the sample mean




This number


is “relative” and is only of use in comparing the distribution of two or more


variables.



Suppose I have two samples, and I want to know which sample has more variability…



If


both


samples


have


the


same


mean,


the


one


with


the


higher


standard


deviation


will


have the greater variability.



However, if they have different means, I need to calculate


the coefficient of variation to determine which one has the most variability.



xbar = 458,


s = 112 versus xbar = 687, s = 192



Standardized Data and Detecting Outliers



Z


-score:






z



=


?


s< /p>


?


?


x


?


x


?


x



s



The z-score tells us how many standard deviations a value is from the mean.



We can look at a


picture of what a z-score tells u


s.



In the Normal Curve…the mean is at the highest point and the


curve tails off symmetrically in both directions.





The sign of the z-score tells us which direction the value is from the mean on the Normal Curve.



Negative values will be to the left, and positive values will be to the right.



Standardizing Scores:



Standard


Normal


Curve


…the


mean


is


zero,


and


the


standard


deviation


is


1.



The


distribution


is


bell-shaped


and


symmetrical.



The area


under


the curve


is


1,


and


the


tails


of


the


curve


extend


out


infinitely.



They never actually touch the horizontal axis.



The highest point on the curve is at the


mean


Return to our data …let’s calculate the z


-


scores for each of the values…




Empirical Rule


?


used when the distribution is assumed to known to be approximately


normal.





?


Approximately 68% of the values will fall within 1


sd


of the mean


?


Approximately 95% of the values will fall within 2


sd


of the mean


?


Approximately 99.9% of the values will fall within 3


sd


of the mean


Chebyshev’s Theorem



?



doesn’t require that the data have a normal distribution






Says that at least (


1



1/z


2


) values will fall within z standard deviations of the mean.


1-1/1


2


= 0,






1-1/2


2


= .75,







1-1/3


2


= .88889,







1-1/4


2


= .9375,







1-1/5


2


= .96


?



We can’t make a


ny assumptions about the percent of values that are within 1


sd


of the


mean


But…





?


At least 75% of the values will fall within 2


sd


of the mean


?


At least 88.9% of the values will fall within 3


sd


of the mean


We use Chebyshev’s


Theorem to estimate the variation in a distribution when


?



n


< 30, or


?


the shape of the distribution is unknown, or



?


the distribution is assumed to be non-normal.


Outliers:



suspect or extreme values of data that must be identified and scrutinized.



If they


are instances of incorrectly entered data, they should be corrected.



If the value was


entered correctly and it is a valid number, it should remain in the dataset as part of the


initial analysis.


When we use the z-score method for identifying outliers, we assume that any value that has a z-score


with


an


absolute


value


greater


than


3.0


(that


is


less


than


-3.0


or


greater


than


+3.0)


is


an


outlier.



Before we proceed with data analysis, we need to examine all outliers for accuracy.



If we determine


that the value is valid, we often run two sets of analysis.



One with the outlier, and one without.




Another way to identify outliers…



Related to IQR is the Five number summary


…minimum, Q1, Q2, Q3, & maximum.



These


values feed into upper and lower limits, and we graph them in a box plot.


Five Number Summary


Minimum


53


-


-


-


-


-


-


-


-



本文更新与2021-02-06 11:31,由作者提供,不代表本网站立场,转载请注明出处:https://www.bjmy2z.cn/gaokao/607758.html

统计学外文翻译的相关文章