Quantcast
Channel: User Kamil Maciorowski - Super User
Viewing all articles
Browse latest Browse all 645

Answer by Kamil Maciorowski for How to combine wordlists in Linux

$
0
0

Solution

For newline-terminated lists (like /usr/share/dict/words) in files named input1 and input2:

join -t "$(printf '\n')" -1 2 -2 2 -o 1.1,2.1 input1 input2 | paste -d '' - -

Explanation

join will consider each line of input1 and input2 as an array of fields separated by newline characters (-t "$(printf '\n')"). Since there is exactly one newline character per complete line, each line (minus its terminating newline character) will form the first field entirely, all later fields will be "virtual" and empty.

-1 2 -2 2 tells join to join lines where the second field of the first file matches the second field of the second file. As stated, these fields are empty, so each line from input1 will match each line from input2. The result will be the Cartesian product of the two sets of lines. For each member the tool will print the first field from the first file followed by the first field from the second file (-o 1.1,2.1), but because first fields are in fact our whole lines (without newlines), we will get all possible combinations in the form of:

line from input1line from input2

The newline after line from input2 appears because a record ends here, this is fine. The newline after line from input1 appears because our chosen separator is the newline character. This separator was perfect for input, it's wrong here in the output. paste -d '' - - is to fix this. The tool takes one line from its standard input (-) and concatenates it with the next line also from its standard input (-) with nothing in between (-d ''); and so on. This way each ordered pair of lines being a member of the Cartesian product becomes:

line from input1line from input2

Notes

  • If a list uses spaces and/or tabs as separators, use the following command to convert it to a newline-terminated list:

    <blank-separated-list { tr ' \t''\n'; echo; } | grep . >newline-terminated-list

    Leading separators will be ignored, trailing separators will be ignored, consecutive separators will be treated as one. If blank-separated-list contains an incomplete line then echo will fix this. There will be no empty lines in newline-terminated-list.

  • Posixly join requires its inputs to be text files, so does grep (see this answer to learn what this means). paste requires text files, except there is no limit to line lengths. tr shall accept any input.


Viewing all articles
Browse latest Browse all 645

Trending Articles