python - Is pandas.DataFrame.groupby Guaranteed To Be Stable? -


i've noticed there several uses of pd.dataframe.groupby followed apply implicitly assuming groupby stable - is, if a , b instances of same group, , pre-grouping, a appeared before b, a appear pre b following grouping well.

i think there several answers implicitly using this, but, concrete, here one using groupby+cumsum.

is there promising behavior? documentation states:

group series using mapper (dict or key function, apply given function group, return result series) or series of columns.

also, pandas having indices, functionality theoretically achieved without guarantee (albeit in more cumbersome way).

although docs don't state internally, uses stable sort when generating groups.

see:

as mentioned in comments, important if consider transform return series it's index aligned original df. if sorting didn't preserve order, make alignment perform additional work need sort series prior assigning. in fact, mentioned in comments:

_algos.groupsort_indexer implements counting sort , @ least o(ngroups), where

ngroups = prod(shape)

shape = map(len, keys)

that is, linear in number of combinations (cartesian product) of unique values of groupby keys. can huge when doing multi-key groupby. np.argsort(kind='mergesort') o(count x log(count)) count length of data-frame; both algorithms stable sort , necessary correctness of groupby operations.

e.g. consider: df.groupby(key)[col].transform('first')


Comments

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

mapreduce - Resource manager does not transit to active state from standby -

serialization - Convert Any type in scala to Array[Byte] and back -