python - Word count: 'Column' object is not callable -
from pyspark.sql.functions import split, explode sheshakespearedf = sqlcontext.read.text(filename).select(removepunctuation(col('value'))) shakespearedf.show(15, truncate=false)
the dataframe looks this:
ss = split(shakespearedf.sentence," ") shakewordsdfa =explode(ss) shakewordsdf_s=sqlcontext.createdataframe(shakewordsdfa,'word')
any idea doing wrong? tip says column not iterable
.
what should do? want change shakewordsdfa
dataframe , rename.
just use select:
shakespearedf = sc.parallelize([ ("from fairest creatures desire increase", ), ("that thereby beautys rose might never die", ), ]).todf(["sentence"]) (shakespearedf .select(explode(split("sentence", " ")).alias("word")) .show(4)) ## +---------+ ## | word| ## +---------+ ## | from| ## | fairest| ## |creatures| ## | we| ## +---------+ ## showing top 4 rows
spark sql columns not data structures. there not bound data , meaningful when evaluated in context of specific dataframe
. way columns
behave more functions.
Comments
Post a Comment