Kezdőlap

|

Mi a kreditvadasz.hu Egy felsőoktatási közösségi oldal amely segít kapcsolatot tartani a hallgatók között, így segítséget nyújt a sikeres tanulmányokhoz...

Stat3B

Országok listájaHungaryBudapesti Corvinus EgyetemGazdálkodástudományi KarNemzetközi gazdálkodás (angol nyelven)StatisticsJegyzetekLectureStat3B

2008.02.29 12:18:33
(10)
Szerző: Antal Brigi
Cimkék:


Az alábbi szöveg egy formázás és képek nélküli előnézete a dokumentumnak. A tökéletes megjelenítéshez jelentkezz be, majd töltsd le a dokumentumot.

JOHN S. LOUCKS
Manchester Business School University of Buckingham

Slides Prepared by

JIM FREEMAN

St. Edward's University Edward'

EDDIE SHOESMITH

© 2007 Thomson Learning EMEA

Slide 1

Chapter 3 Descriptive Statistics: Numerical Measures Part B
Measures of Variability Measures of Distribution Shape, Relative Location, and Detecting Outliers Exploratory Data Analysis

© 2007 Thomson Learning EMEA

Slide 2

1

Measures of Variability
It is often desirable to consider measures of variability (dispersion), as well as measures of location. For example, in choosing supplier A or supplier B we might consider not only the average delivery time for each, but also the variability in delivery time for each.

© 2007 Thomson Learning EMEA

Slide 3

Measures of Variability
Range Interquartile Range Variance Standard Deviation Coefficient of Variation

© 2007 Thomson Learning EMEA

Slide 4

2

Range
The range of a data set is the difference between the largest and smallest data values. It is the simplest measure of variability. It is very sensitive to the smallest and largest data values.

© 2007 Thomson Learning EMEA

Slide 5

Range
Range = largest value - smallest value Range = 615 - 425 = 190
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615

© 2007 Thomson Learning EMEA

Slide 6

3

Interquartile Range
The interquartile range of a data set is the difference between the third quartile and the first quartile. It is the range for the middle 50% of the data. It overcomes the sensitivity to extreme data values.

© 2007 Thomson Learning EMEA

Slide 7

Interquartile Range
3rd Quartile (Q3) = 525 (Q 1st Quartile (Q1) = 445 (Q Interquartile Range = Q3 - Q1 = 525 - 445 = 80
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615

© 2007 Thomson Learning EMEA

Slide 8

4

Variance
The variance is a measure of variability that utilizes all the individual data values. It is based on the difference between the value of each observation (xi) and the mean ( x for a sample, (x for a population).

© 2007 Thomson Learning EMEA

Slide 9

Variance
The variance is the average of the squared differences between each data value and the mean. The variance is computed as follows: ( xi - x ) 2 s = n -1
2 2

2=

(x - )
i

2

N

for a sample

for a population

© 2007 Thomson Learning EMEA

Slide 10

5

Standard Deviation
The standard deviation of a data set is the positive square root of the variance. It is measured in the same units as the data, making data, it more easily interpreted than the variance.

© 2007 Thomson Learning EMEA

Slide 11

Standard Deviation
The standard deviation is computed as follows:

s = s2
for a sample

=

2

for a population

© 2007 Thomson Learning EMEA

Slide 12

6

Coefficient of Variation
The coefficient of variation indicates how large the standard deviation is in relation to the mean. The coefficient of variation is computed as follows:

s × 100 % x
for a sample

× 100 % for a population

© 2007 Thomson Learning EMEA

Slide 13

Variance, Standard Deviation, And Coefficient of Variation
Variance
s2 =

(x

- x )2 = n-1
i

2, 996.16

Standard Deviation
s = s 22 = 2996.47 = 54.74

Coefficient of Variation

the standard deviation is about 11% of of the mean

54.74 s × 100 % = × 100 % = 11.15% x 490.80

© 2007 Thomson Learning EMEA

Slide 14

7

Working with Grouped Data
Variance for Grouped Data Standard Deviation for Grouped Data

© 2007 Thomson Learning EMEA

Slide 15

Variance for Grouped Data
For sample data

f i ( Mi - x ) 2 s = n -1
2

For population data

fi ( Mi - ) 2 = N
2

© 2007 Thomson Learning EMEA

Slide 16

8

Sample Variance for Grouped Data

Rent () 420-439 440-459 460-479 480-499 500-519 520-539 540-559 560-579 580-599 600-619 Total

fi 8 17 12 8 7 4 2 4 2 6 70

Mi 429.5 449.5 469.5 489.5 509.5 529.5 549.5 569.5 589.5 609.5

Mi - x -63.7 -43.7 -23.7 -3.7 16.3 36.3 56.3 76.3 96.3 116.3

(M i - x )2 f i (M i - x )2 4058.96 32471.71 1910.56 32479.59 562.16 6745.97 13.76 110.11 265.36 1857.55 1316.96 5267.86 3168.56 6337.13 5820.16 23280.66 9271.76 18543.53 13523.36 81140.18 208234.29

continued
© 2007 Thomson Learning EMEA Slide 17

Sample Variance for Grouped Data
Sample Variance s2 = 208,234.29/(70 ­ 1) = 3,017.89 Sample Standard Deviation

s = 3,017.89 = 54.94
This approximation differs by only 0.20 from the actual standard deviation of 54.74.

© 2007 Thomson Learning EMEA

Slide 18

9

Measures of Distribution Shape, Relative Location, and Detecting Outliers
Distribution Shape z-Scores Chebyshev's Theorem Chebyshev' Empirical Rule Detecting Outliers

© 2007 Thomson Learning EMEA

Slide 19

Distribution Shape: Skewness
An important measure of the shape of a distribution is called skewness. skewness. The formula for computing skewness for a data set is somewhat complex. Skewness can be easily computed using statistical software.

© 2007 Thomson Learning EMEA

Slide 20

10

Distribution Shape: Skewness
Symmetrical (not skewed) · Skewness is zero. · Mean and median are equal.
0.35

Skewness = 0

Relative Frequency

0.30 0.25 0.20 0.15 0.10 0.05 0

© 2007 Thomson Learning EMEA

Slide 21

Distribution Shape: Skewness
Moderately Skewed Left · Skewness is negative. · Mean will usually be lower than the median.
0.35

Skewness = - 0.31

Relative Frequency

0.30 0.25 0.20 0.15 0.10 0.05 0

© 2007 Thomson Learning EMEA

Slide 22

11

Distribution Shape: Skewness
Moderately Skewed Right · Skewness is positive. · Mean will usually be higher than the median.
0.35

Skewness = 0.31

Relative Frequency

0.30 0.25 0.20 0.15 0.10 0.05 0

© 2007 Thomson Learning EMEA

Slide 23

Distribution Shape: Skewness
Highly Skewed Right · Skewness is positive (often above 1.0). · Mean will usually be more than the median.
0.35

Skewness = 1.25

Relative Frequency

0.30 0.25 0.20 0.15 0.10 0.05 0

© 2007 Thomson Learning EMEA

Slide 24

12

Distribution Shape: Skewness
Example: Apartment Rents Seventy apartments were randomly sampled in a small university town. The monthly rents for these apartments are listed in ascending order on the next slide.

© 2007 Thomson Learning EMEA

Slide 25

Distribution Shape: Skewness
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600 435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615

© 2007 Thomson Learning EMEA

Slide 26

13

Distribution Shape: Skewness

0.35

Skewness = 0.92

Relative Frequency

0.30 0.25 0.20 0.15 0.10 0.05 0

© 2007 Thomson Learning EMEA

Slide 27

z-Scores
The z-score is also called the standardized value. It denotes the number of standard deviations a data value xi is from the mean.

zi =

xi - x s

© 2007 Thomson Learning EMEA

Slide 28

14

z-Scores
An observation's z-score is a measure of the relative observation' location of the observation in a data set. A data value less than the sample mean will have a z-score less than zero. A data value greater than the sample mean will have a z-score greater than zero. A data value equal to the sample mean will have a z-score of zero.

© 2007 Thomson Learning EMEA

Slide 29

z-Scores
z-Score of Smallest Value (425)
z= xi - x 425 - 490.80 = = - 1.20 s 54.74
-1.02 -0.84 -0.75 -0.34 -0.01 0.62 1.81 -1.02 -0.84 -0.75 -0.29 -0.01 0.62 1.99 -1.02 -0.84 -0.56 -0.29 0.17 0.81 1.99 -1.02 -0.84 -0.56 -0.29 0.17 1.06 1.99 -1.02 -0.84 -0.56 -0.20 0.17 1.08 1.99 -0.93 -0.75 -0.47 -0.20 0.17 1.45 2.27 -0.93 -0.75 -0.47 -0.20 0.35 1.45 2.27
Slide 30

Standardized Values for Apartment Rents
-1.20 -0.93 -0.75 -0.47 -0.20 0.35 1.54 -1.11 -0.93 -0.75 -0.38 -0.11 0.44 1.54 -1.11 -0.93 -0.75 -0.38 -0.01 0.62 1.63

© 2007 Thomson Learning EMEA

15

Chebyshev's Theorem
At least (1 - 1/z2) of the items in any data set will be 1/z within z standard deviations of the mean, where z is any value greater than 1.

© 2007 Thomson Learning EMEA

Slide 31

Chebyshev's Theorem
At least 75% of the data values must be within z = 2 standard deviations of the mean. the At least 89% of the data values must be within z = 3 standard deviations of the mean. the At least 94% of the data values must be within z = 4 standard deviations of the mean. the

© 2007 Thomson Learning EMEA

Slide 32

16

Chebyshev's Theorem
For example: Let z = 1.5 with x = 490.80 and s = 54.74 At least (1 - 1/(1.5)2) = 1 - 0.44 = 0.56 or 56% of the rent values must be between

x - z(s) = 490.80 - 1.5(54.74) = 409
and x + z(s) = 490.80 + 1.5(54.74) = 573 (Actually, 86% of the rent values are between 409 and 573.)
© 2007 Thomson Learning EMEA Slide 33

Empirical Rule
For data with a bell-shaped distribution: bell68.26% of the values of a normal random variable are within +/- 1 standard deviation of its mean. of +/95.44% of the values of a normal random variable are within +/- 2 standard deviations of its mean. +/of 99.72% of the values of a normal random variable are within +/- 3 standard deviations of its mean. +/of

© 2007 Thomson Learning EMEA

Slide 34

17

Empirical Rule
99.72% 95.44% 68.26%

+ 3 ­ 3 ­ 1 + 1 ­ 2 + 2
© 2007 Thomson Learning EMEA

x

Slide 35

Detecting Outliers
An outlier is an unusually small or unusually large value in a data set. A data value with a z-score less than -3 or greater than +3 might be considered an outlier. It might be: · an incorrectly recorded data value · a data value that was incorrectly included in the data set · a correctly recorded data value that belongs in the data set

© 2007 Thomson Learning EMEA

Slide 36

18

Detecting Outliers
The most extreme z-scores are -1.20 and 2.27 Using |z| > 3 as the criterion for an outlier, there are |z no outliers in this data set. Standardized Values for Apartment Rents
-1.20 -0.93 -0.75 -0.47 -0.20 0.35 1.54 -1.11 -0.93 -0.75 -0.38 -0.11 0.44 1.54 -1.11 -0.93 -0.75 -0.38 -0.01 0.62 1.63 -1.02 -0.84 -0.75 -0.34 -0.01 0.62 1.81 -1.02 -0.84 -0.75 -0.29 -0.01 0.62 1.99 -1.02 -0.84 -0.56 -0.29 0.17 0.81 1.99 -1.02 -0.84 -0.56 -0.29 0.17 1.06 1.99 -1.02 -0.84 -0.56 -0.20 0.17 1.08 1.99 -0.93 -0.75 -0.47 -0.20 0.17 1.45 2.27 -0.93 -0.75 -0.47 -0.20 0.35 1.45 2.27
Slide 37

© 2007 Thomson Learning EMEA

Exploratory Data Analysis
Five-Number Summary FiveBox Plot

© 2007 Thomson Learning EMEA

Slide 38

19

Five-Number Summary
1 2 3 4 5 Smallest Value First Quartile Median Third Quartile Largest Value

© 2007 Thomson Learning EMEA

Slide 39

Five-Number Summary
Lowest Value = 425 Third Quartile = 525
425 440 450 465 480 510 575 430 440 450 470 485 515 575 430 440 450 470 490 525 580 435 445 450 472 490 525 590 435 445 450 475 490 525 600

First Quartile = 445 Largest Value = 615
435 445 460 475 500 535 600 435 445 460 475 500 549 600 435 445 460 480 500 550 600 440 450 465 480 500 570 615 440 450 465 480 510 570 615

Median = 475

© 2007 Thomson Learning EMEA

Slide 40

20

Box Plot
A box is drawn with its ends located at the first and third quartiles. A vertical line is drawn in the box at the location of the median (second quartile).

375 400 425 450 475 500 525 550 575 600 625 Q1 = 445 Q3 = 525 Q2 = 475
© 2007 Thomson Learning EMEA Slide 41

Box Plot
Limits are located (not drawn) using the interquartile range (IQR). Data outside these limits are considered outliers. outliers. The location of each outlier is shown by a suitable symbol, e.g. * . ... continued

© 2007 Thomson Learning EMEA

Slide 42

21

Box Plot
The lower limit is located 1.5(IQR) below Q1. Lower Limit: Q1 - 1.5(IQR) = 445 - 1.5(75) = 332.5 The upper limit is located 1.5(IQR) above Q3. Upper Limit: Q3 + 1.5(IQR) = 525 + 1.5(75) = 637.5 There are no outliers (values less than 332.5 or greater than 637.5) in the apartment rent data.

© 2007 Thomson Learning EMEA

Slide 43

Box Plot
Whiskers (dashed lines) are drawn from the ends of the box to the smallest and largest data values inside the limits.

375 400 425 450 475 500 525 550 575 600 625 Smallest value inside limits = 425
© 2007 Thomson Learning EMEA

Largest value inside limits = 615
Slide 44

22

End of Chapter 3, Part B

© 2007 Thomson Learning EMEA

Slide 45

23

Hasonló témájú dokumentumok
Egyelőre még egyetlen hasonló témájú file sincs feltöltve a rendszerbe
A mások által feltöltött dokumentumokat értékelheted. Ha úgy ítéled meg, hogy a vizsgára való felkészülés szempontjából hasznos volt egy dokumentum, akkor adj rá sokcsillagos értékelést.
Ha hibákat tartalmaz, vagy egyéb probléma van vele, akkor keveset.
A dokumentumok sorrendje az értékelések alapján adódik. Ami fentebb van a listában, azt hasznosabbnak ítélték társaid. Az új dokumentumok pedig (értékelések hiányában) szintén a lista tetején kezdenek.

Hozzászólások

Ha észrevételed van egy dokumentummal kapcsolatban (például hibát találtál benne), akkor a Hozzászólások részben jelezheted. Az olyan jellegű kérdéseket mint pl.: A 2. feladat 4. sorából milyen átalakítással jutottunk az 5. sorban szereplő képlethez? - szintén ide érdemes írni
Egy tipp az oldalhoz! - Naptári bejegyzéseket vehettek fel egy tantárggyal kapcsolatban, vagy az egész szakotok számára. Például:
  • Zh időpontok
  • Gólyabál időpontja
  • Házi leadási határidő
  • Tanítási szünetek
  • stb ...
Kattints a Naptárra, majd a jobb felső részen levő Új naptári bejegyzés felvétele linkre.

Cimkefelhő

16 2004 algebra alkotmánytörténet alternatív energiaforrások anyag arc civilisztika épszerk4 esszék eupol finance folyami duzzasztómű gábor glikolízis globális logisztika illeték információs társadalom informatika írányítástechnika irodalomesztétika ismertető jegyzőkönyv jogi és államigazgatási alapismeretek juh juhász istván képek kognitív disszonancia konzultáció közigazgatás alapintézményei közigazgatástörténet logika magyar gótika magyar premodern marketing tétel máté eörs megoldások motiváció órai anyag órai diák peter behrens political science rezgéstan rézsűállékonyság sql szám tanulás urbán vállalatirányítás vetőmag