python - Word count: 'Column' object is not callable -


from pyspark.sql.functions import split, explode  sheshakespearedf = sqlcontext.read.text(filename).select(removepunctuation(col('value')))  shakespearedf.show(15, truncate=false) 

the dataframe looks this:

enter image description here

ss = split(shakespearedf.sentence," ") shakewordsdfa =explode(ss)  shakewordsdf_s=sqlcontext.createdataframe(shakewordsdfa,'word') 

any idea doing wrong? tip says column not iterable.

what should do? want change shakewordsdfa dataframe , rename.

just use select:

shakespearedf = sc.parallelize([     ("from fairest creatures desire increase", ),     ("that thereby beautys rose might never die", ), ]).todf(["sentence"])  (shakespearedf     .select(explode(split("sentence", " ")).alias("word"))     .show(4))  ## +---------+ ## |     word| ## +---------+ ## |     from| ## |  fairest| ## |creatures| ## |       we| ## +---------+ ## showing top 4 rows 

spark sql columns not data structures. there not bound data , meaningful when evaluated in context of specific dataframe. way columns behave more functions.


Comments

Popular posts from this blog

serialization - Convert Any type in scala to Array[Byte] and back -

matplotlib support failed in PyCharm on OSX -

python - Matplotlib: TypeError: 'AxesSubplot' object is not callable -