How to get all distinct words within a set of lines?

I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?

Say for example I have lines that look like this:

[

[(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],

[(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],

[(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]

]

Then i would get a list of distinct words, looking this:

isPhysicallySettledFxFwd

isPhysicallySettledFxSwap

isPhysicallySettledCommodity

NO

YES

Y

N

(

)

"

[

]

,

I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...

asked Apr 19 at 5:36

user3203476

1283

New contributor

I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?

Say for example I have lines that look like this:

[

[(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],

[(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],

[(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]

]

Then i would get a list of distinct words, looking this:

isPhysicallySettledFxFwd

isPhysicallySettledFxSwap

isPhysicallySettledCommodity

NO

YES

Y

N

(

)

"

[

]

,

I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...

asked Apr 19 at 5:36

user3203476

1283

New contributor

I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?

Say for example I have lines that look like this:

[

[(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],

[(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],

[(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]

]

Then i would get a list of distinct words, looking this:

isPhysicallySettledFxFwd

isPhysicallySettledFxSwap

isPhysicallySettledCommodity

NO

YES

Y

N

(

)

"

[

]

,

I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...

asked Apr 19 at 5:36

user3203476

1283

New contributor

I would like to extract a list of distinct words from a set of lines. Is there a way of doing this ?

Say for example I have lines that look like this:

[

[(isPhysicallySettledFxFwd, NO,"Y"),(isPhysicallySettledFxFwd,isPhysicallySettledFxSwap,"N")],

[(isPhysicallySettledFxSwap,NO,"Y"),(isPhysicallySettledFxSwap, isPhysicallySettledCommodity,"Y")],

[(isPhysicallySettledCommodity,NO,"Y"),(isPhysicallySettledCommodity,YES,"Y")]

]

Then i would get a list of distinct words, looking this:

isPhysicallySettledFxFwd

isPhysicallySettledFxSwap

isPhysicallySettledCommodity

NO

YES

Y

N

(

)

"

[

]

,

I am not sure how to even start, apart from copying the lines to Excel and doing lots of manipulations...

regular-expression functions vi-words list

asked Apr 19 at 5:36

user3203476

1283

New contributor

asked Apr 19 at 5:36

user3203476

1283

New contributor

asked Apr 19 at 5:36

user3203476

1283

New contributor

asked Apr 19 at 5:36

user3203476

1283

asked Apr 19 at 5:36

user3203476

1283

New contributor

user3203476 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

3 Answers
3

active

oldest

votes

You can do something like this:

:let a=

:%s/w+/=add(a, submatch(0))/gn

:new

:put =uniq(sort(a))

This will first declare a list a to work with. Then we run a :%s command, to capture all word-characters (w+) and act on all matches (g flag of the :s command), but won't actually replace (n flag). We use a sub-replace-expression(=) in the replacement part, to store the captured submatch in list a.

And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.

You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

Maybe this:

:%s/W/rr/g

:sort u

:g/^s*$/d

The first puts a line break before and after each non-word character.

The second command sorts the entire file with the option "unique", so all duplicate lines are removed.

The third command deletes all lines that are empty or only contain whitespaces.

answered Apr 19 at 6:28

Ralf

3,7451318

You can use grep with the --only-matching/-o flag to accomplish this:

:%!grep -o 'w+|W' | sort -u

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

You can do something like this:

:let a=

:%s/w+/=add(a, submatch(0))/gn

:new

:put =uniq(sort(a))

And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.

You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

You can do something like this:

:let a=

:%s/w+/=add(a, submatch(0))/gn

:new

:put =uniq(sort(a))

And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.

You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

You can do something like this:

:let a=

:%s/w+/=add(a, submatch(0))/gn

:new

:put =uniq(sort(a))

And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.

You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

You can do something like this:

:let a=

:%s/w+/=add(a, submatch(0))/gn

:new

:put =uniq(sort(a))

And finally, we create a new window, and put the unique and sorted (uniq) content of list a into it.

You can get a lot more sophisticated, like only capturing certain words, or counting the numbers, but this shows how flexible the :s command is.

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

answered Apr 19 at 6:07

Christian Brabandt

16.2k2646

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

how wonderful ! thank you !!

– user3203476
Apr 19 at 6:27

Maybe this:

:%s/W/rr/g

:sort u

:g/^s*$/d

The first puts a line break before and after each non-word character.

The second command sorts the entire file with the option "unique", so all duplicate lines are removed.

The third command deletes all lines that are empty or only contain whitespaces.

answered Apr 19 at 6:28

Ralf

3,7451318

Maybe this:

:%s/W/rr/g

:sort u

:g/^s*$/d

The first puts a line break before and after each non-word character.

The second command sorts the entire file with the option "unique", so all duplicate lines are removed.

The third command deletes all lines that are empty or only contain whitespaces.

answered Apr 19 at 6:28

Ralf

3,7451318

Maybe this:

:%s/W/rr/g

:sort u

:g/^s*$/d

The first puts a line break before and after each non-word character.

The second command sorts the entire file with the option "unique", so all duplicate lines are removed.

The third command deletes all lines that are empty or only contain whitespaces.

answered Apr 19 at 6:28

Ralf

3,7451318

Maybe this:

:%s/W/rr/g

:sort u

:g/^s*$/d

The first puts a line break before and after each non-word character.

The second command sorts the entire file with the option "unique", so all duplicate lines are removed.

The third command deletes all lines that are empty or only contain whitespaces.

answered Apr 19 at 6:28

Ralf

3,7451318

answered Apr 19 at 6:28

Ralf

3,7451318

answered Apr 19 at 6:28

Ralf

3,7451318

answered Apr 19 at 6:28

Ralf

3,7451318

You can use grep with the --only-matching/-o flag to accomplish this:

:%!grep -o 'w+|W' | sort -u

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

You can use grep with the --only-matching/-o flag to accomplish this:

:%!grep -o 'w+|W' | sort -u

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

You can use grep with the --only-matching/-o flag to accomplish this:

:%!grep -o 'w+|W' | sort -u

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

You can use grep with the --only-matching/-o flag to accomplish this:

:%!grep -o 'w+|W' | sort -u

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

answered Apr 19 at 15:03

Peter Rincker

10.6k11828

This page is only for reference, If you need detailed information, please check here

4sK jhQKrJzu30Kf 51CHxDf

搜尋此網誌

Csdrhrt