File Manager, file edit don't respect encoding on save

Hi Ilia,

The file managers file editor still don't respect my selected encoding at save on several files.

When I open a file that initially is coded in Latin 1/ISO-8859-1 it often opens it in UTF-8 and makes a mess of lots of things, and if I try to change to the correct encoding to get the characters ok before I save, it most of the time say I have to save before I can change encoding!!?? Why do I have to mess our Swedish letters up with a save to UTF-8? When I open the file again the letters are really messed up, and if I now change the encoding to ISO-8859-1 it really do scramble the letters! I now has to change all the messed up letters manually before I can save it again! And hopefully it keeps my selected encoding!

And some files I open and save as ISO-8859-1 is still UTF-8 when I open it again!? It don't respect my selected encoding!

Best regards, Leffe

Status: 
Closed (fixed)

Comments

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 02:29

Please attach the file example.

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 04:54

Once you reporting to have it on CentOS, then the solution for you would be is to install:

yum install perl-Encode-Detect.

The other problem could be that you have to little data in the file for the detection to run properly and it returns empty string.

Besides, there is a feature in File Manager's editor, that I made on purpose - any time you have manually switched the encoding, then any opened file with undetected encoding will have the encoding that you have chosen previously. Usually users don't switch between encodings at all and if they do, they use one type of it.

There is no bug here. You either don't have mentioned libs installed or file is too small for detection to run correctly.

I have checked it on /usr/libexec/webmin/lang/sv file. Encoding is detected correctly each time I opened it. (even after switching encodings manually)

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 05:04

In case you run Ubuntu, which I just had and encoding detection doesn't work properly because deps are not installed, you can also install it using Other/Perl Modules by selecting install from CPAN, named:

Encode::Detect::Detector

Make sure that you have gcc installed and other dev env tools by running apt install build-essential.

Good luck.

Hi Ilia,

Yes, our server run on CentOS and has perl-Encode-Detect installed.

Yes, it's a small file but that should not matter, if I chose a encoding it should respect my selection!

And yes, it is a bug! If the fault is in the saving or in the opening process I don't know! If everything was OK It should open in my saved encoding and I should not need to change the messed up characters again when I open the file. And if the encoding detection of the file can't decide the encoding, it should leave it alone, and maybe we should have a setting for "undetected" encoding.

This is really a big problem because many scripts actually can't work properly with UTF-8 encoded data, when data is passed between different scripts, jquery, javascript and sql. UTF-8 encoded swedish characters does not get interpreted correctly, you most of the time has to change encoding to iso-8859-1 to not mess up the characters.

This is not a problem just for me... it exists everywhere in sweden, in mails, web and so on, you run in to unreadable UTF crashed characters.

I did also set Webmin/Virtualmin to EN-US UTF-8 before I started to migrate users over to our new dedi server, because the non UTF languages was about to be removed. So I have spent months on changing user names, and other info and text to readable characters after migration of users, and I am still not done with it! And even in Webmin/Virtualmin the UTF characters don't get interpreted correctly! If I now change Webmin to the non UTF encoding EN-US all names and text entered with swedish characters gets messed up, and has to be edited manually to be correct! And yes, everything has to be changed manually again if changing back to EN-US UTF-8.

I am really thinking of changing Webmin back to NON UTF encoded EN-US if that could help, I can take the massive update/changes on our users swedish characters (again), and also... not to forget all the scrips and code that used swedish characters that also did "crash" after the migration to the new UTF-8 server. But if I can get this UTF pain to go away after all work to restore to non UTF Webmin is done, I would prefer that.

Will the File Manager and it's editor work correctly in non UTF languages, and will the NON UTF languages stay in Webmin/Virtualmin?

Regards, Leffe

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 10:58

Okay, tell me the following please:

  1. Do you see broken encoding when opening /usr/libexec/webmin/lang/sv

  2. Are you talking about recognition or conversion of encoding? You could only convert encoding by cutting to clippboard the text, then changing encoding and pasting it back in to the editor.

I don't remember but I don't think encoding is supported in non-utf8 mode.

Using utf8 should work without breakage. What are your settings for language?

Hi,

Sorry for me being such a pain... smile

  1. I have not checked now but I think they have opened up correctly before! But that has nothing to do with the editor not saving the file in the selected encoding, or, not opening in the correct encoding. I cant think of why we has to add lots of text only to get the encoding recognition working properly. I have several smaller files and they should be saved/opened in my selected encoding.

  2. I was mainly talking about recognition... I don't mean that the editor should convert my text, but it should not do that on opening a file either! And it is not consistent, sometimes the same file opens as ISO-8858-1 and sometimes as UTF8 even if there has not been any changes or saves.

Using UTF8 do brake things, for example passing UTF8 encoded data to PDF generation will in several cases result in strange characters! Passing data to our cellphone operator who provides our text message services trough their API's will in some features brake when using UTF. This is mainly due to a big part of computers and data is still using iso-8859-1/Latin 1 here in sweden, browsers and servers in both private, business or governmental use is still in use of non UTF encoding in stored data, presented or sent data. And I actually think most of the MySQL encoding is still Latin 1 as default.

Do you mean my language setting in Webmin/Virtualmin? if so it is EN-US UTF8.

If I change my Webmin to NON UTF en-us, will the editor open a undetected encoding of a file as iso-8859-1?

//Leffe

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 12:23

Can you please send me the file that is not correctly opened to my mail at programming at rostovtsev.ru?

Ilia's picture
Submitted by Ilia on Tue, 03/27/2018 - 14:21

Leffe, I was about to do the release tonight but I could wait untill tomorrow.

It might be possible to add proper text-conversion, I will have to to check.

I can not though reproduce your issue. I will need the file that fails on your side, so I could test it on my machine. Please also add the screenshot of how it should look like and of how it looks broken.

Hi again,

I'm trying to figure this out Besides, there is a feature in File Manager's editor, that I made on purpose - any time you have manually switched the encoding, then any opened file with undetected encoding will have the encoding that you have chosen previously. Usually users don't switch between encodings at all and if they do, they use one type of it.

If I do like this...

  1. I open a file which I know is encoded in iso-8859-1 but encoding can't be detected, and it opens it as utf8.

  2. Without doing anything withe file I change the encoding to iso-8859-1 and then close the window without saving.

  3. I then open the file again, and now it opens as iso-8859-1. And also the other files that cant be detected.

I have tried this both ways, iso->utf and utf->iso and it works every time!

I hope that is the way to use your feature, the above steps work every time. The feature seems to be "reset" when closing the file manager but to redo the steps when starting to work is no problem at all!

Now when I know how to use this feature everything is better, the problem before was that I could not change the encoding unless the file is saved first... in other words, save the messed up characters. But this happened because I had "touched", not changed, the file and it was flagged changed/unsaved.

I did not know about this feature before, with this my world gets a bit easier!

(I still think it would be better to be able to turn off the detection like in JFM "Attempt to use proper character set?")

Thanks!

//Leffe

Ilia's picture
Submitted by Ilia on Wed, 03/28/2018 - 02:30

(I still think it would be better to be able to turn off the detection like in JFM "Attempt to use proper character set?")

Leffe, it's already there, it's just automatic. Anytime encoding is not detected, it falls to defaults. By defaults means either UTF8 or the encoding you had been choosing before. It's explicitly saying which encoding that is currently on. Nothing is broken, unless you forcefully save the file. You don't need to fix manually any chars, just use encoding select at the top of the editor's frame to outsmart the detection mechanism and tell the editor which encoding to use.

In the future, I will add the possibility of actually making conversion, without cutting file contents to clipboard, changing encoding and then pasting - right now is the only way to convert. It's not perfect.

So, Leffe, in case you don't understand what I mean by conversion, do the following. Open the file in Swedish that is encoded in ISO-8859-1, then change at the top to UTF-8 - as a result you will most likely see broken encoding. Now, if you currently want to convert ISO-8859-1, to UTF8, first select all text (while all chars are displayed correctly and have correct encoding) and cut it to clipboard, then change encoding to UTF-8, and paste it back. Now, hit save and the file will be in UTF-8 after next time you open it. Like I said, in the future, I will make part UI for doing it nicely.

Best regards, Ilia

P.S. I'm considering this ticket solved.

Ilia's picture
Submitted by Ilia on Wed, 03/28/2018 - 02:30

Status: Active ยป Closed (fixed)

Hi Ilia,

The problem has been that I had to save the file before the ability to change encoding. But it turns out that the encoding get changed even without saving, just change encoding, close the file and then reopen it,

regarding the conversion I totally understand what you mean, but having the editor to automatically convert the text is not a good thing. For example, I always use UltraEdit texteditor when I code, UE also has the ability to autodetect encodings but I ALWAYS has that disabled, because if it is enabled It will all of the sudden start coding in UTF8, and when uploading that file to a system running iso-8859-1 it will make a mess. So... the autodetect encoding is always off in my UE.

Having the editor changing charactars (converting them) automatically is not a good thing, unless it it can be disabled in a setting. The thing is that our swedish characters often gets coverted to the wrong character when converting from UTF8, and I know what you think that cant be happening, but it actually do.

In my opinion all these automatic things happening always need to have a disable option, because lots of the auto things make the work much more time consuming and many times mess things up due to a auto thing thinking "i will correct that for you, regardless of if you want or not".

//Leffe