Answer by Kamil Maciorowski for Grep search for text in an ISO-8859-1 encoded file

How can I prevent the grep output from stripping the accented characters?

grep itself does not strip accented characters, it outputs matching lines as they are in the input file. It's your terminal (terminal emulator) that doesn't interpret accented characters encoded as ISO-8859-1 as anything it should display as accented characters.

Your terminal most likely expects UTF-8. The rest of this answer assumes the terminal does expect UTF-8 and the locale is something.UTF-8 (e.g. pt_PT.UTF-8). It should be so in many modern Unix-like systems by default, certainly in Linux.

Possible solutions:

You may be able to configure your terminal emulator to ISO-8859-1, run the command and reconfigure back to UTF-8. (e.g. in konsole select from the menu: View, Set Encoding; and so on). I wouldn't call this the right way though.

Alternatively convert the output of grep to UTF-8 on the fly:

LC_ALL=pt_PT.ISO-8859-1 grep -a ese\$ wordsList | iconv -f ISO-8859-1 -t UTF-8

If you plan to work with the file a lot, convert the content to UTF-8*:
```
<wordsList iconv -f ISO-8859-1 -t UTF-8 >wordsList-utf8
```
Then work with the new file without tricks, e.g.:
```
grep ese\$ wordsList-utf8
```
Now you can even grep for accented characters in a straightforward way, e.g.:
```
grep ó wordsList-utf8
```
In general Unicode equivalence may be a problem; but here, since the file is a conversion from ISO-8859-1, I expect consistency: every ó shall be U+00F3 (0xC3B3 in UTF-8, the above grep will find it), not U+006F followed by U+0301 (0x6FCC81 in UTF-8, the above grep would not find it); similarly for other accented characters.

* I notice you used grep -a, as if you needed grep to treat binary files like text. If your wordsList is truly non-text, converting the whole of it to UTF-8 may fail or give you mangled non-text parts. Since you did not link to a single specific file, I cannot investigate further without guessing. I guess you meant the file linked under "just the file", i.e. the file one can extract from wordsList.zip. With this particular file I do not need -a for grep, if only I tell grep to use the right encoding (this is what LC_ALL=pt_PT.ISO-8859-1 does).

Answer by Kamil Maciorowski for Grep search for text in an ISO-8859-1 encoded file

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112