https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.kruskal.html

Unless you have a large sample size and can clearly demonstrate that your data are normal, you should routinely use Kruskal–Wallis; they think it is dangerous to use one-way anova, which assumes normality, when you don’t know for sure that your data are normal.

Example 1¶

There are four cost groups, please compare if the given groups are statistically different.

H0: there are no differences between cost groups
H1: there are differences between cost groups

import pandas as pd

GrupA = [57, 65, 50, 45, 70, 62, 48]
GrupB = [72, 81, 64, 55, 90, 38, 75]
GrupC = [35, 42, 58, 59, 46, 60, 61]
GrupD = [73, 85, 92, 68, 82, 94, 66]

df = pd.DataFrame({ 'GrupA': GrupA, 'GrupB':GrupB, 'GrupC':GrupC, 'GrupD':GrupD })
df

import scipy.stats as ss

H, p = ss.kruskal(df['GrupA'], df['GrupB'], df['GrupC'], df['GrupD'], nan_policy='omit')
print('p-value:      ',p)
print('H statistics: ',H)

p-value:       0.003317738567191764
H statistics:  13.716396903589015

p-value:       0.002984914427615507
H statistics:  8.816666666666663

p-value:       0.03179486110380625
H statistics:  4.609523809523793

We reject the H0 hypothesis because p = 0.003 is less than 0.005 (p <0.005)

Przykład 2¶

H0: influenza is not statistically different

H1: influenza is statistically different

import numpy as np

A = [900, 1200,850, 1320,1400, 1150, 975,np.nan ]
B = [625, 640, 775, 1000,690,  550,  840,750]
C = [415, 400, 420, 560, 780,  620,  800,390]
D = [410, 310, 320, 280, 500,  385,  440,np.nan ]
E = [340, 425, 275, 210, 575,  360, np.nan  ,np.nan]

df = pd.DataFrame({ 'A': A, 'B':B, 'C':C, 'D':D, 'E':E })
df

Defines how to handle when input contains nan. The following options are available (default is ‘propagate’):

‘propagate’: returns nan

‘raise’:     throws an error

‘omit’:      performs the calculations ignoring nan values

import scipy.stats as ss

H, p = ss.kruskal(df['B'], df['E'], nan_policy='omit')
print('p-value:      ',p)
print('H statistics: ',H)

p-value:       0.002984914427615507
H statistics:  8.816666666666663

p-value:       0.03179486110380625
H statistics:  4.609523809523793

Groups B and E differ statistically because p <0.005

http://www.biostathandbook.com/kruskalwallis.html

Example 3¶

Cafazzo et al. (2010) observed a group of freely moving domestic dogs on the outskirts of Rome. Based on observations from 1815, they were able to place dogs in the hierarchy of dominance, from the most dominant (Merlino) to the most submissive (Pisola). Because it is a truly ranking variable, it is necessary to use the Kruskal-Wallis test. The average rank for men (11.1) is lower than the average rank for women (17.7) and the difference is significant (H = 4.61, 1 df, P = 0.032).

dog= ['Merlino','Gastone','Pippo','Leon','Golia','Lancillotto','Mamy','Nanà','Isotta','Diana','Simba','Pongo','Semola','Kimba','Morgana','Stella','Jaś','Cucciola','Mammolo','Dotto','Gongolo','Małgosia','Brontolo','Eolo','Mag','Emy','Pisola']
Rang= [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]
Seks = ['M','M','M','M','M','M','K','K','K','K','M','M','M','M','K','K','M','M','M','M','M','K','K','K','K','K','K']

H0: both groups are the same

H1: groups are different from each other

df2 = pd.DataFrame({ 'Name': dog, 'Sex':Seks, 'Rang':Rang })
df2

K = df2[df2['Sex']=='K']['Rang'].to_list()
M = df2[df2['Sex']=='M']['Rang'].to_list()

H, p = ss.kruskal(K, M, nan_policy='omit')
print('p-value:      ',p)
print('H statistics: ',H)

p-value:       0.03179486110380625
H statistics:  4.609523809523793

H1: the groups differ from each other because p <0.05

	A	B	C	D	E
0	900.0	625	415	410.0	340.0
1	1200.0	640	400	310.0	425.0
2	850.0	775	420	320.0	275.0
3	1320.0	1000	560	280.0	210.0
4	1400.0	690	780	500.0	575.0
5	1150.0	550	620	385.0	360.0
6	975.0	840	800	440.0	NaN
7	NaN	750	390	NaN	NaN

THE DATA SCIENCE LIBRARY

Wojciech Moszczyński

Testy Kruskal -Wallis

Example 1¶

Przykład 2¶

Example 3¶

	GrupA	GrupB	GrupC	GrupD
0	57	72	35	73
1	65	81	42	85
2	50	64	58	92
3	45	55	59	68
4	70	90	46	82
5	62	38	60	94
6	48	75	61	66

	Name	Sex	Rang
0	Merlino	M	1
1	Gastone	M	2
2	Pippo	M	3
3	Leon	M	4
4	Golia	M	5
5	Lancillotto	M	6
6	Mamy	K	7
7	Nanà	K	8
8	Isotta	K	9
9	Diana	K	10
10	Simba	M	11
11	Pongo	M	12
12	Semola	M	13
13	Kimba	M	14
14	Morgana	K	15
15	Stella	K	16
16	Jaś	M	17
17	Cucciola	M	18
18	Mammolo	M	19
19	Dotto	M	20
20	Gongolo	M	21
21	Małgosia	K	22
22	Brontolo	K	23
23	Eolo	K	24
24	Mag	K	25
25	Emy	K	26
26	Pisola	K	27

	GrupA	GrupB	GrupC	GrupD
0	57	72	35	73
1	65	81	42	85
2	50	64	58	92
3	45	55	59	68
4	70	90	46	82
5	62	38	60	94
6	48	75	61	66

	GrupA	GrupB	GrupC	GrupD
0	57	72	35	73
1	65	81	42	85
2	50	64	58	92
3	45	55	59	68
4	70	90	46	82
5	62	38	60	94
6	48	75	61	66