Sample Computer Science Research Paper on Regular expressions

Regular expressions

The formal language theory together with the theoretical computer science defines regular expressions as a system that can be used in pattern matching in the context of the text data with the help of either the strings or string matching. It should be noted that regular expressions are mostly applied in the UNIX systems, but some individuals can also use them in their personal computers (Mcnaughton, 2013). Regular expressions which can also referred to as rational expressions are useful parameters for the word or even the multiple characters’ search within the strings. The brief history of the regular expressions reveals that it originated during 1960s, when it was officially formalized by Stephen Kleene. He used a mathematical notation that he called regular set (Habibi, 2014).  

The main purpose of the regular expression is to assist in quick searching with the intention of identifying a whole or a given number of the textual content of the predetermined pattern. Apart from regular expression assisting in the searching of data, it can also serve as an editing tool through facilitating automated editing task using the method of replacing either a binary data or even a text. It is worth noting that regular expressions when used as an editing tool is most appropriate with the EdidPad Pro. Besides, the regular expression can serve the purposes of files splitting, files renaming, files merging, and files coping (Habibi, 2014).

Examples of the regular expressions

Whitespace Trimming

An undertaking of the regular expression of search and replacing can be used for trimming of the unintended whitespace in both the string and the text file data. In this case, the task performed is the searching of the ^[ t]+ together with the nothing replacement resulting into the whitespace deletions. Whitespace deletion can also be performed by undertaking a searching task of the [t]+$ that results in the trimming of the whitespaces that are trailing. Both whitespace deletion and the trimming of the trailing whitespace can be performed simultaneously using the regular expression.  This is performed by combining the two respective expressions to come up with a single expression of ^[t]+|[ t]+$. In the event that the character class is to be expanded, the expression [t] that represents a tab or a space can be replaced with the expression [trn]. The regular expression [trn] can also serve the purpose of trimming of the line breaks (Steve, 2010).

HTML Tags Grabbing                                         

A specific HTML tag can be opened or closed using the regular expression <TAGb [^>]*> (.*?)</TAG>. All the tag contents represent the backreference. Moreover, the question mark is used in this regular expression as a way of making the star to be lazy. Making the star lazy means that the expression terminates before the last tag instead of the stopping prior to the first closing tag. Therefore, it makes the star to operate in a similar manner as the greedy star. Moreover, the regular expression <([A-Z][A-Z0-9]*)b[^>]*>(.*?)</1> has the ability of matching both the opening and closing of any pair of the HTML tag. One striking aspect that should be kept in mind while using this regular expression is that case sensitivity should be kept insensitive at all time. Another important aspect that should be noted is the use of the backreference 1. It ensures that all the data that are found within the tags features within the second backreference (Habibi, 2014).

IP Addresses

Regular expression can be applied in the matching of the IP addresses. It offers an exact fast, and, reliable method of matching and identifying of the IP addresses. An expression such as . bd{1,3}.d{1,3}.d{1,3}.d{1,3}b has the ability of matching and identifying any IP address. However, the main shortcoming for this expression is the fact that it can also match and identify the code 999.999.999.999 as another IP address. An appropriate regular expression can be used to restrict the four numbers that are found in the IP address. A good example of the regular expression is the   (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?). The regular expression restrict these four numbers by storing all the four numbers in a group called capturing group. Moreover, the capturing groups can further be used in coming up with the new IP addresses (Steve, 2010).  

Shortcomings of regular expressions

Just like any other system, regular expression has also its own demerits. One of its shortcomings is a phenomenon known as catastrophic backtracking. Catastrophic backtracking is characterized by the regular expression operating at a very slow speed and the crashing of the application. The good news is that this shortcoming can be remedied by simply becoming more precise on the data to be matched. This is because, when the data to be matched is more general then the search engine will generate more matches exponentially that will in the long run slows the searching and matching process. Therefore, it is inappropriate in the event that an individual intends to match a scenario which is more general (Habibi, 2014).

Another shortcoming of the regular expression is the possibility of matching zero length string in all the searches in the event that everything is made optional. Consequently, it necessitates that only selected parts of the regular expression should be made optional to be in line with the present parts. Another shortcoming of the regex is its inability to mix Unicode and the 8-bit character codes. This is because, there is a possibility of realizing unexpected or even unintended results. Therefore, the regular expression is inappropriate in the situation where the data to be searched and matched contains a combination of both the 8-bit character codes as well as the Unicode (Mcnaughton, 2013).

References

Habibi, M. (2014). Regular Expressions. Java Regular Expressions: Taming the Java.util.regex

Engine, 1-54.

Mcnaughton, R. (2013). An Introduction to Regular Expressions. Applied Automata Theory, 35

-54.

Steve, F. (2010). Regular Expressions. Beginning Perl, 153-177.