escaping - How to identify a special character in a file using java -
i have .doc file contains header before ÐÏ , need remove characters exist before ÐÏ.
example : asdfasdfasdfasfasdfasfÐÏ9asjdfkj
i have used below code.
inputstream = new fileinputstream("d:\\users\\vinoth\\workspace\\testing\\testing_2.doc"); datainputstream dis = new datainputstream(is); outputstream os = new fileoutputstream("d:\\users\\vinoth\\workspace\\testing\\testing_3.doc"); dataoutputstream dos = new dataoutputstream(os); byte[] buff = new byte[dis.available()]; dis.readfully(buff); char temp = 0; boolean start = false; try{ for(byte b:buff){ char c = (char)b; if(temp == 'Ð' && c == 'Ï' ){ start = true; } if(start){ dos.write(c); } temp = c; }
however , not writing in file first if condition not getting satisfied. please advise how can perform .
there wrong when use char c = (char)b;
refer byte-and-char-conversion-in-java
you see
a character in java unicode code-unit treated unsigned number.
take case example. byte binary presentation of character 'Ï' 11001111. refer oracle tutorial,
byte: byte data type 8-bit signed two's complement integer. has minimum value of -128 , maximum value of 127 (inclusive).
so value of byte -49. however, unicode usage, 11001111 should interpreted unsigned byte , should 207 actually.
int = b & 0xff;
will unsigned byte value of binary presentation.
you can modify code below. debug, have changed file path , file format. i'm not sure whether .doc issue code has bugs mentioned actually.
import java.io.*; public class test { public static void main(string args[]){ inputstream is; try { = new fileinputstream("testing_2.txt"); datainputstream dis = new datainputstream(is); outputstream os = new fileoutputstream("testing_3.txt"); dataoutputstream dos = new dataoutputstream(os); byte[] buff = new byte[dis.available()]; dis.readfully(buff); char temp = 0; boolean start = false; for(byte b:buff){ int = b & 0xff; char c = (char)i; if(temp == 'Ð' && c == 'Ï' ){ start = true; } if(start){ dos.write(c); } temp = c; } } catch (exception e) { // todo auto-generated catch block e.printstacktrace(); } } }
Comments
Post a Comment