Large list FlatMap Java Spark -
i have large list in javapairrdd<integer, list<string>>
, want flatmap possible combinations of list entries end javapairrdd<integer, tuple2<string,string>>
. if have like
(1, ["a", "b", "c"])
i want get:
(1, <"a","b">) (1, <"a", "c">) (1, <"b", "c")
the problem large lists have done created large list of tuple2 objects having nested loop on input list. list not fit in memory. found this, not sure how implement in java: spark flatmap function huge lists
you may want flatmap
list , join rdd
on before filtering equal values:
javapairrdd<integer, list<string>> original = // ... javapairrdd<integer, string> flattened = original.flatmapvalues(identity()); javapairrdd<integer, tuple2<string, string>> joined = flattened.join(flattened); javapairrdd<integer, tuple2<string, string>> filtered = joined.filter(new function<tuple2<integer, tuple2<string, string>>, boolean> () { @override public boolean call(tuple2<integer, tuple2<string, string>> kv) throws exception { return kv._2()._1().equals(kv._2()._2()); } });
Comments
Post a Comment