Large list FlatMap Java Spark -


i have large list in javapairrdd<integer, list<string>> , want flatmap possible combinations of list entries end javapairrdd<integer, tuple2<string,string>>. if have like

(1, ["a", "b", "c"])

i want get:

(1, <"a","b">) (1, <"a", "c">) (1, <"b", "c")

the problem large lists have done created large list of tuple2 objects having nested loop on input list. list not fit in memory. found this, not sure how implement in java: spark flatmap function huge lists

you may want flatmap list , join rdd on before filtering equal values:

javapairrdd<integer, list<string>> original = // ... javapairrdd<integer, string> flattened = original.flatmapvalues(identity()); javapairrdd<integer, tuple2<string, string>> joined = flattened.join(flattened); javapairrdd<integer, tuple2<string, string>> filtered =      joined.filter(new function<tuple2<integer, tuple2<string, string>>, boolean> () {         @override         public boolean call(tuple2<integer, tuple2<string, string>> kv) throws exception {             return kv._2()._1().equals(kv._2()._2());         }     }); 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -