Genetic engineering, CRISPR-Cas9 gene editing & bioinformatic tools

 

The beginning of genetic engineering 

In 1953 the structure of DNA was discovered, and this event initiated the era of molecular genetics. In 1967 the ligase enzyme and then in 1970 the first restriction enzyme were isolated. These tools made it possible to break DNA and glue the ends together meaning that it was possible to make the first artificial recombinant DNA molecules. This was followed by the emergence of gene cloning, the process during which a fragment of DNA is inserted into a plasmid vector, and then this plasmid is introduced into a bacterial cell. These cells can be grown on agar plates, cloning the cell with modification. This is the way they used to produce human insulin for people with diabetes. Soon genetically modified crops appeared which was a field that was worth putting effort and research into.  

 

Since these first experiments, technology has come a long way. Today we have various genetic engineering tools which make it possible to make certain editions in the genome of all organisms. 

 

The origins of the CRISPR-Cas9 technology

The CRISPR – cas9 system is a now widely used revolutionary genetic engineering tool and as in many things in biology is inspired by nature itself.  

 CRISPR stands for clustered regularly interspaced short palindromic repeats. They had already been characterized in 1993 and since then have been increasingly studied. 

It turned out that its function is to protect bacteria and archaea against viruses. The way it works is the following:  

In the chromosome of these organisms there is a so-called CRISPR loci which contains the CRISPRs. Between these there are different regions which turn out to be identical to bits of DNAs of the viruses that attack the given species. It also turns out these loci change dynamically, meaning these regions accumulate these virus DNA fragments. These suggest that the bacteria and archaea have a method for acquiring DNA bits from the viruses and inserting into their CRISPR loci.  

These fragments are then transcribed in a way that the palindromic CRISPRs form hairpins. The long molecule that still contains all the CRISPRs and viral copies, will then be chopped in a way that one piece from a virus and one hairpin form so-called CRISPR-RNAs (crRNA). Thanks to the hairpins these molecules can be recognized by the cas proteins, and these assemble into the so- called effector complex. The complex is guided by its RNA part to the corresponding viruses and the protein part of the complex, which is an endonuclease, will cut it. The damaged viral DNA cannot be repaired so it will degrade.  

During the investigations of cas proteins these were classified into two major groups. The class 1 cas proteins work in an assembly with the crRNA while the class 2 cas proteins work individually with the RNA.  

The protein cas 9 is one of the second group. Researchers found that the cas 9 gene (which was called otherwise back then) is responsible for protecting some bugs from viral infections and they initiated a collaboration to find out what was the mechanism behind it.  

It was found that the protein cas 9 which is coded by a single gene also called cas9 is an endonuclease which works together with a duo of two RNAs. A spacer RNA which is actually the CRISPR, and a tracer RNA. The tracer RNA matches  the DNA sequence that is aimed to be destroyed, and the tracer RNA is the one that can bind to cas 9 protein and therefore activate the cutting mechanism. Then the enzyme will unbind the double helix, and with its two distinct active sites cuts both strands of the DNA. The site where the cutting happens is followed by a DNA fragment that has the base sequence NGG (where N means any nucleotide, and G means guanine)  

 

 

Exploiting the invention of nature

Unlike in prokaryotes, in eukaryotes DNA damage does not always result in the death of the cell, as DNA repair is possible in these organisms. This can happen through the insertion of some bases. This, of course, can disturb the expression of the given gene in many ways. This gave the idea of using the CRISPR-cas9 system to our own use for genetic modification. 

The general workflow of a genetic engineering project with CRISPR is the following: 

The first step is designing the RNA that will match the DNA we want to edit. So, to do this, first the sequence that we wish to engineer must be identified. Then with the help of many already existing resources we can choose an RNA that will suit our goals. The next step is to assemble a complex of the RNAs and the cas protein, the so-called ribonucleoprotein (RNP). This simply means that we put together the trRNA and the crRNA in a 1:1 molar ratio within. This duo is known as the guide RNA. Then the guide RNA will be added to the cas 9 protein also using a 1:1 molar ratio. If the goal of our experiment is to knockout a certain gene, then we do not add anything else to the assembly. These steps are followed by the delivery of the RNP into the cell of our interest. This can happen through various methods: Lipofection, electroporation or microinjection. The simplest one is lipofection. As the consequence of the double stranded break (DSB) nonhomologous end joining will happen (NHJE) This means that a few bases will be inserted between the ends of the broken DNA strands. This way the gene will not code for a functional protein anymore. If our aim is not knocking out a gene but implementing a modification so the gene would function differently than we need to add an extra piece of DNA to the cell of our interest. This is called the homology directed repair (HDR) template. This piece of DNA contains the sequence we want to insert between two arms that are complementary to the DNA that we wish to edit.  

 

 

Ethical considerations

As CRISPR has become a popular technology worldwide due to its cost effectiveness, ease of use and lack of requirement for sophisticated technology, ethical concerns have arisen regarding its uses. Researchers have argued that CRISPR should be used in gene therapy in somatic cells, but not in germline editing as modifications would be passed to future generations.

When considering germline editing, the primary concern is safety. There is a high risk of off-target effects and mosaicism which cannot balance any potential benefits. However, it was acknowledged that in some cases, such as both parents having the disease-causing variant, germline editing could be more useful than any other existing genome editing technologies used for reproductive purposes.

As this is a new technology, it has also been argued that genome editing will widen the gap between wealthy and poor.

From a moral and religious standpoint, CRISPR should also not be used in genome-editing research involving the creation or destruction of embryos. There are some laboratories which use non viable embryos for their research to address curiosities about human biology, but it cannot be used under any circumstances for reproductive purposes.

Gene editing is also used on animals, but ethical concerns related to decimating an entire species, eliminating food sources for certain species, and promoting the proliferation of invasive pests are raised by opponents of CRISPR.

 

Bioinformatics

 

Bioinformatics plays an essential role in detection and analysis of CRISPR systems. Thanks to bioinformatic analyses, matches of CRISPR spacers to bacteriophages were first detected, which lead to the conclusion that CRISPR-Cas acts as an acquired immune system. 

 

  • Prediction of CRISPR-Cas systems

Perhaps it’s most obvious usage is in the prediction and characterization of CRISPR-Cas systems. This practically means identifying cas genes and CRISPR arrays. While cas genes are easily predicted by classical databases like Pfam, CRISPR sequences can cause more problems due to their irregularity because of spacer acquisition (short sequences from the phage genome inserted between the CRISPR repeats after the infection).Therefore, all identification methods focus on finding sequences that meet specific requirements of repeat length, spacing, similarity or number. The most popular tools to achieve this are CRISPRFinder and CRISPRCasFinder. The desired output for each studied CRISPR array, covers information about the coordinates,length and sequence of each found spacer, a crucial knowledge in designing CRISPR-based gene editing experiments. The process starts with a search for repetitive elements in the genome that can form a putative array, bearing in mind the high sequence similarity between the direct repeats. The most promising CRISPR candidates have a length between 23-55 nt, repeat similarity of 80% and are offset by 0.6-2.5 times the repeat size. In the last step, the algorithm evaluates the similarity of the predicted spacer sequences via multiple alignment through MUSCLE. If the pairwise similarity between spacers exceeds 60%, the sequence is ruled out. Otherwise, a level of confidence for lower similarity level is established, where levels 3 and 4 mark highly promising candidates.

 

  • Classification of CRISPR-Cas

Classification of CRISPR-Cas systems is essential to illustrate the origins and evolution of CRISPR loci in microbial genomes. Because of the high diversity in complexity of the majority of cas protein sequences (as they have evolved much quicker compared to other archaeal and bacterial genes), this classification task is as important as it is challenging. The first algorithm developed to crack this conundrum is called CRISPRmap.It utilized CRISPR sequence and the RNA secondary structure conservation of the direct repeats.These direct repeats are taken as an input to group them into clusters based on secondary structure preservation and the sequence. These clusters are then checked for an overlapping motif with the child clusters. Those that satisfy specific criteria are classified into families based on  using Markov Clustering. The algorithm was tested on a complex set of more than 3500 CRISPR sequences. The result was a successful identification of  33 potential conserved structural motifs and 40 sequence families. Such information is absolutely crucial in studying  evolutionary relationships between distinct cas proteins, and allows for their effective classification.

 

  • Target identification

The main advantage of CRISPR-Cas gene editing technology, a specific target identification,  is performed by two mechanisms. First, the spacers are almost exactly complementary to sequence in the targeted place in the nucleic acid, and second, the target has to be accompanied by the cas-specific PAM. PAM, the protospacer adjacent motif, is a short DNA sequence required for a Cas nuclease to cut and is generally found 3-4 nucleotides downstream from the desired cleavage site. Popular software tools that have been developed to study the targeting efficiency and potential off-targets are: CCTop, Cas-OFFinder and newly-established uCRISPR. To identify the targets of newly-developed CRISPR-cas systems with unknown PAMs, another program, CRISPRtarget, simply performs similarity searches based on spacer sequences. It is essentially based on the BLAST algorithm, comparing user-provided guide RNA sequences against selected databases along with potential target sequences, for instance of phage genomes.

 

  1. Guide RNA design 

Bioinformatic analysis also plays a significant role in the design of the synthetic guide RNA, a crucial component of this gene-editing technology. Target specificity is the most necessary criteria that has to be met by these gRNAs. Some of the most widely used tools to achieve this are: E-CRISPR, CHOPCHOP, GuideScan. All of them are based on similar workflow principles. While identifying the target gene which is supposed to be edited, the key step is the selection of an appropriate, complementary PAN region. The selected candidate is then assessed according to two desired features: high on-target efficiency (checked using NGS methods) and low off-target activity. The algorithms checking the latter are based on  a minimal biophysical model of free energy necessary for transitions of the CRISPR-cas  effector complex, i.e. PAM binding and R loop formation, utilizing hybridization kinetics. The most developed ones are fairly reasonable predictions with up to 98% accuracy, and are mainly evaluated by mismatch positions. Another approach which is utilized to find potential off-target binding regions in the whole genomes, is finding sites which have 1-3 mismatches to the guide RNA. All of the known methods are only a relative measure of non-target activity since they only take into consideration sequence similarities, and not experimental factors like Cas proteins concentrations.

 

CRISPR applications

 

Though the CRISPR-Cas9 system is mainly famous for its significance in the field of genome engineering, there are many other applications that exploit the benefits of this system. 

 

CRISPR systems in transcriptional activation and repression

 

Cas9 protein, by introducing just a few mutations, can be rendered catalytically inactive, so it will no longer be able to cut the DNA strand. However, its target finding qualities will be retained. Such modified cas9 protein can then also be marked with an accessory regulatory component. Once bound, it recruits transcriptional factors to the targeted gene, which can reversibly either silence or enhance its expression. 

 

A good example of this application is dCas9 SAM system to amplify gene expression. A specific sgRNA guides dCas9 with an array of transcriptional activators ( such as VP62 and p65)  to the promoter of the gene of interest. This powerful method multiplies the gene expression up to three thousand times. Moreover, SAM systems are able to act on 10 genes simultaneously, making polygenic interaction studies possible. In addition to mRNA, SAM influences the activity of non-coding RNAs, crucial regulatory factors in many organisms. All of the above makes it an extremely attractive tool to change the epigenetic landscape of the organism, including reprogramming cellular activity, which has multiple applications in regenerative medicine.

 

Using CRISPR Libraries for Screening

CRISPR screening is an experimental approach used to discover genes or genetic sequences that elicit a specific function or phenotype for a cell type. For instance, nowadays, CRISPR screening is used to identify genes or genetic sequences associated with drug resistance, drug sensitivity, susceptibility to environmental toxins or DNA sequences leading to a particular disease state.

When, for example, the resistance of a cell line to a drug treatment is tested, CRISPR screening is used to knockout one gene per cell, resulting in a population of cells with a different gene knocked out in each cell, and the new population of edited cells is allowed to grow for a few days. The tested drug will kill some cells, but others will survive and then, next generation sequencing is performed on the edited cells that survive to identify which DNA sequences are now present and which are absent. This technique can identify which genes the cells require in order to survive the drug treatment. This methodology has been used, for example, to understand the genetic changes that have occurred in some cancer cell lines which make them resistant to a particular drug treatment.

“CRISPR libraries” are not exactly CRISPR guide RNAs, but rather the batch of lentiviruses containing a pool of oligonucleotides (each virus will have a different oligonucletide from the pool), each coding for a CRISPR guide and cloned into a lentiviral gene-containing plasmid. Lentiviruses are RNA viruses, so each lentivirus contains viral RNA. The viral RNA is too long to be used by the Cas enzyme; it is first reverse-transcribed into DNA which integrates into the genome of the infected cell. The aim is to infect the cells with one virion per cell and since each lentivirus in the library includes one sequence from the original oligonucleotide pool, only one such sequence is integrated into the genome of each infected cell. After integration, the lentiviral sequences, including the cloned-in CRISPR sequences, are transcribed to RNA, producing CRISPR guide RNA. A Cas enzyme must also be expressed in the target cells. Following treatment of the cells with the lentiviral library and the Cas enzyme, cells are incubated to allow phenotypic CRISPR-mediated changes, following which a specific treatment may be performed if desired for a particular experiment. After this, DNA (or RNA) samples can be collected from the cells and subject to next generation sequencing.

CRISPR screening has also been used in animals. Researchers used a cancer cell line from a mouse and infected it with a CRISPR library of over 67,000 lentiviruses. When the cells were transplanted into the mouse, tumors started to grow. After sequencing the DNA of the metastases, the researchers found several genes targeted by the CRISPR technology. This helped the scientists pinpoint the genes in which loss-of-function results in tumor formation and metastasis.

Imaging living cells

Imaging DNA and RNA in living cells is a challenge that has been addressed before, however, there are still regions which cannot be visualized with great precision. Fluorescent in situ hybridization (FISH) is a commonly known technique that researchers use to track DNA and RNA, however this method requires the fixation of the cell.  

The target specificity of the CRISPR-Cas 9 system offers a great potential for achieving improvements in this field.  

A cas9 protein that is engineered to lack the endonuclease activity is fused with an enhanced green fluorescent protein (EGFP). This is then combined with a carefully designed small guide RNA (sgRNA).  

The analysis of the results occurs then similarly to FISH but in contrast to that, this process does not require the denaturation of nucleic acid, nor cell fixation, which makes it less error prone. Thanks to the specificity of this complex this results in a more efficient way of observing chromosome dynamics The two principal areas in DNA imaging that have special significance are chromosome remodeling and telomere dynamics. 

 

Sources:

 

Chen, B., Gilbert, L. A., Cimini, B. A., Schnitzbauer, J., Zhang, W., Li, G. W., Park, J., Blackburn, E. H., Weissman, J. S., Qi, L. S., & Huang, B. (2013). Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell, 155(7), 1479–1491. https://doi.org/10.1016/j.cell.2013.12.001

 

Omer S. Alkhnbashi, Tobias Meier, Alexander Mitrofanov, Rolf Backofen, Björn Voß,

CRISPR-Cas bioinformatics, Methods, Volume 172, 2020, Pages 3-11, ISSN 1046-2023,

https://doi.org/10.1016/j.ymeth.2019.07.013, https://www.sciencedirect.com/science/article/pii/S1046202318304717

 

Genscript.com. 2021. CRISPR for Transcriptional Activation and Repression-GenScript丨CRISPR/Cas9 Applications. [online] Available at: <https://www.genscript.com/crispr-for-transcriptional-activation-and-repression.html> [Accessed 25 August 2021].

 

Gavin J. Knott, Jennifer A. Doudna, CRISPR-Cas guides the future of genetic engineering, Science  31 Aug 2018: Vol. 361, Issue 6405, pp. 866-869, DOI: 10.1126/science.aat5011

Integrated DNA Technologies: Getting started with CRISPR: a review of gene knockout and homology-directed repair

Desmond S. T. Nicholl: An Introduction to Genetic Engineering Third Edition, 2008, Cambridge University Press

Genome.gov. 2017. What are the Ethical Concerns of Genome Editing?. [online] Available at: <https://www.genome.gov/ about-genomics/policy-issues/Genome-Editing/ethical-concerns> 

Caplan, A., Parent, B., Shen, M. and Plunkett, C., 2015. No time to waste—the ethical challenges created by CRISPR. EMBO reports, [online] 16(11), pp.1421-1426. Available at: <https://www.embopress.org/doi/full/10.15252/embr.201541337>

Spencer, N., 2019. Overview: What is CRISPR screening?. [online] IDT. Available at: <https://eu.idtdna.com/pages/education/decoded/article/overview-what-is-crispr-screening> [Accessed 30 August 2021].

Genscript.com. 2021. Applications of CRISPR. [online] Available at: <https://www.genscript.com/applications-of-crispr.html> [Accessed 30 August 2021].

 

Leave a Reply

Your email address will not be published. Required fields are marked *