Noisy text
The noise can be seen as all the differences between the surface form of a coded representation of the text and the intended, correct, or original text. It can be due to e.g. typographic errors or colloquialisms always present in natural language and usually lowers the data quality in a way that makes the text less accessible to automated processing by computers such as natural language processing. The noise can also get introduced through an extraction process (i.e. transcription, OCR) from media other than original electronic texts.