regex - Comparing two version of the same string -


i write function compare 2 string in r. more precisely, if have data :

data <- list(   "first sentence.",   "very first sentence.",   "very first , 1 sentences." ) 

i output :

[1] "very"                    " , 1 sentences" 

my output built substring not included in previous one. example:

2nd vs 1st, remove matching string - "first sentence." - 2nd, result "very".

#       "first sentence." #  "very first sentence." # match: ^^^^^^^^^^^^^^^ 

now compare 3rd vs 2nd, remove matching string - "very first" - 3rd , result " , 1 sentences".

#       "very first sentence." #       "very first , 1 sentences." # match: ^^^^^^^^^^ 

then compare 4th vs 3rd, etc...

so based on example output should be:

c("very", " , 1 sentences") # [1] "very"                    " , 1 sentences" 

here's tidyverse approach:

library(dplyr) library(tidyr)  # put data in data.frame data_frame(string = unlist(data)) %>%      # add id column can recombine later     add_rownames('id') %>%      # add lagged column compare against     mutate(string2 = lag(string)) %>%      # break strings words     separate_rows(string) %>%      # evaluate following calls rowwise (until regrouped)     rowwise() %>%      # chop rows string compare against,     filter(!is.na(string2),             # word not in comparison string            !grepl(string, string2, ignore.case = true)) %>%      # regroup id     group_by(id) %>%     # reassemble strings     summarise(string = paste(string, collapse = ' '))  ## # tibble: 2 x 2 ##      id                  string ##   <chr>                   <chr> ## 1     2                    ## 2     3 , 1 sentences. 

select out string if you'd vector appending

 ...     %>% `[[`('string')  ## [1] "very"                    "and 1 sentences." 

Comments

Popular posts from this blog

jOOQ update returning clause with Oracle -

java - Warning equals/hashCode on @Data annotation lombok with inheritance -

java - BasicPathUsageException: Cannot join to attribute of basic type -