python - Word count: 'Column' object is not callable -

from pyspark.sql.functions import split, explode  sheshakespearedf = sqlcontext.read.text(filename).select(removepunctuation(col('value')))  shakespearedf.show(15, truncate=false)

the dataframe looks this:

ss = split(shakespearedf.sentence," ") shakewordsdfa =explode(ss)  shakewordsdf_s=sqlcontext.createdataframe(shakewordsdfa,'word')

any idea doing wrong? tip says column not iterable.

what should do? want change shakewordsdfa dataframe , rename.

just use select:

shakespearedf = sc.parallelize([     ("from fairest creatures desire increase", ),     ("that thereby beautys rose might never die", ), ]).todf(["sentence"])  (shakespearedf     .select(explode(split("sentence", " ")).alias("word"))     .show(4))  ## +---------+ ## |     word| ## +---------+ ## |     from| ## |  fairest| ## |creatures| ## |       we| ## +---------+ ## showing top 4 rows

spark sql columns not data structures. there not bound data , meaningful when evaluated in context of specific dataframe. way columns behave more functions.

Search This Blog

Today

python - Word count: 'Column' object is not callable -

Comments

Post a Comment

Popular posts from this blog

java - Jasper subreport showing only one entry from the JSON data source when embedded in the Title band -

serialization - Convert Any type in scala to Array[Byte] and back -

SonarQube Plugin for Jenkins does not find SonarQube Scanner executable -