Molecular fragments, R-groups, and functional groups (2024)

[previous|newer]/home/writings/diary/archive/2016/08/08/molecular_fragments_and_groups

Molecular fragments, R-groups, and functional groups

For a change of pace, I figured I would do a basic chemistry lessonabout molecular structures, instead of a more computer oriented blogpost.

Chemists often think about a molecule as a core structure (usually aring system) and a set of R-groups. EachR-group is attached to an atom in the core structure by abond. Typically that bond is a single bond, and often "rotatable".

Here's an example of what I mean. The first image below shows thestructure of vanillin, which isthe primary taste behind vanilla. In the second image, I'vecircled ellipsed the three R-groups in the structure.

Molecular fragments, R-groups, and functional groups (1)Molecular fragments, R-groups, and functional groups (2)
Vanillin structure
(the primary taste of vanilla)
Vanillin with three R-groups identified

The R-groups in this case are R1=a carbonyl group (*-CH=O2), R2=amethoxy group (*-O-CH3), and R3=a hydroxyl group (*-OH), where the "*"inidicates where the R-group attaches to the core structure.

The R-group concept is flexible. Really it just means that you have afixed group of connected atoms, which are connected along some bond toa variable group of atoms, and where the variable group is denotedR. Instead of looking at the core structure and a set of R-groups, Ican invert the thinking and think of an R-group, like the carbonylgroup, as "the core structure", and the rest of the vanillin asits R-group.

With that in mind, I'll replace the "*" with the "R" to get the groups"R-CH=O2", "R-O-CH3", and "R-OH". (The "*" means that the fragment isconnected to an atom at this point, but it's really just analternative naming scheme for "R".)

All three of these group are also functionalgroups. Quoting Wikipedia, "functional groups are specific groups(moieties) of atoms or bonds within molecules that are responsible forthe characteristic chemical reactions of those molecules. The samefunctional group will undergo the same or similar chemical reaction(s)regardless of the size of the molecule it is a part of."

These three corresponding functional groups areR1 = aldehyde,R2 = ether. and R3 = hydroxyl.

As the Wikipedia quote pointed out, if you have reaction which acts onan aldehyde, you can likely use it on the aldehyde group of vanillin.

Vanillyl group and capsaicin

A functional group can also contain functional groups. I pointed tothe three functional groups attached to the central ring of avanillin, but most of the vanillin structure is itself anotherfunctional group, a vanillyn:
Molecular fragments, R-groups, and functional groups (3)

Structures which contain a vanillyl group are called vanilloids. Vanillais of course a vanilloid, but surprisingly so is capsaicin, the sourceof the "heat" to many a spicy food. Here's the capsaicin structure,with the vanillyl group circled:
Molecular fragments, R-groups, and functional groups (4)
><P>

The feeling of heat comes because the capsaicin binds toTrpV1 (the transientreceptor potential cation channel subfamily V member 1), also known asthe "capsaicin receptor". It's a nonselective recepter, which meansthat many things can cause it to activate. Quoting that Wikipediapage: "The best-known activators of TRPV1 are: temperature greaterthan 43 °C (109 °F); acidic conditions; capsaicin, theirritating compound in hot chili peppers; and allyl isothiocyanate,the pungent compound in mustard and wasabi." The same receptor detectstemperature, capsaicin, and a compound in hot mustard and wasabi,which is why your body interprets them all as "hot."

Capsaicin is a member of the capsaicinoid family. All capsaicinoidsare vanillyls, all vanillyls are aldehydes. This sort of is-a familymembership relationship in chemistry has lead to many taxonomies andontologies, including ChEBI.

But don't let my example or the existence of nomenclature lead you tothe wrong conclusion that all R-groups are functional groups! AnR-group, at least with the people I usually work with, is a moregeneric term used to describe a way of thinking about molecularstructures.

QSAR modeling

QSAR(pronounced "QUE-SAR") is short for "quantitative structure-activityrelationship", which is a mouthful. (I once travelled to the UK for aUK-QSAR meeting. The border inspecter asked me where I was going, andI said "the UK-QSAR meeting; QSAR is .." and I blanked on theexpansion of that term! I was allowed across the border, so itcouldn't have been that big of a mistake.)

QSAR deals with the development of models which relate chemicalstructure to its activity in a biological or chemical system. Lookingat that, I realize I just moved the words around a bit, so I'll givea simple example.

Consider an activity, which I'll call "molecular weight". (This ismore of a physical property than a chemical one, but I am trying tomake it simple.) My model for molecular weight assumes that each atomhas its own weight, and the total molecular weight is the sum of theindividual atom weights. I can create a training set of molecules, andfor each molecule determine its structure and molecular weight. With abit of least-squares fitting, I can determine the individual atomweight contribution. Once I have that model, I can use it to predictthe molecular weight of any molecule which contains atoms which themodel knows about.

Obviously this model will be pretty accurate. It won't be perfect,because isotopic ratios can vary. (A chemical synthesized from fossiloil is slightly lighter and less radioactive than the same chemicalderived from from environmental sources, because the heavierradioactive 14C in fossil oil has decayed.) But for mostuses it will be good enough.

A more chemically oriented property is the partition coefficient,measured in log units as "log P", which is a measure of the solubilityin water compared to a type of oil. This gives a rough idea of if themolecule will tend to end up in hydrophobic regions like a cellmembrane, or in aqueous regions like blood. One way to predict log Pis with the atom-based approach I sketched for the molecular weight,where each atom type has a contribution to the overall measured logP. (This is sometimes called AlogP.)

In practice, atom-based solutions are not as accurate asfragment-based solutions. The molecular weight can be atom-centeredbecause nearly all of the mass is in the atom's nucleous, which iswell localized to the atom. But chemistry isn't really about atoms butabout the electron density around atoms, and electrons are much lesslocalized than nucleons. The density around an atom depends on theneighboring atoms and the configuration of the atoms in space.

As a way to improve on that, some methods look at the extended localenvironment (this is sometimes called XlogP) or at larger fragmentcontributions (for example, BioByte's ClogP). The more complex it is,the more compounds you need for the training and the slower themodel. But hopefully the result is more accurate, so long as you don'toverfit the model.

If you're really interested in the topic, Paul Beswick of the SussexDrug Discovery Centre wrote a nice summary on the different nuances in log P prediction.

Matched molecular pairs

Every major method from data mining, and most of the minor methods,have been applied to QSAR models. The history is also quite long. Thereare cheminformatics papers back from the 1970s looking at supervisedand unsupervised learning, building on even earlier work on clusteringapplied to biological systems.

A problem with most of these is the black-box nature. The data isnoisy, and the quantum nature of chemistry isn't that good of a matchto data mining tools, so these prediction are used more often to guidea pharmaceutical chemist than to make solid predictions. This meansthe conclusions should be interpretable by the chemist. Try gettingyour neural net to give a chemically reasonable explanation of why itpredicted as it did!

Matched molecular pair (MMP) analysisis a more chemist-oriented QSAR method, with relatively littlemathematics beyond simple statistics. Chemists have long looked atactivities in simple series, like replacing a ethyl (*-CH3) with amethyl (*-CH2-CH3) or propyl (*-CH2-CH2-CH3), or replacing a fluorinewith a heavier halogen like a chlorine or bromine. These can formconsistent trends across a wide range of structures, and chemists haveused these observations to develop techniques for how to, say, improvethe solubility of a drug candidate.

MMP systematizes this analysis over all considered fragments,including not just R-groups (which are connected to the rest of thestructure by one bond) but also so-called "core" structures with twoor three R-groups attached to it. For example, if the known structurescan be described as "A-B-C", "A-D-C", "E-B-F" and "E-D-F" withactivities of 1.2, 1.5, 2.3, and 2.6 respectively then we can do thefollowing analysis:

 A-B-C transforms to A-D-C with an activity shift of 0.3. E-B-F transforms to E-D-F with an activity shift of 0.3. Both transforms can be described as R1-B-R2 to R1-D-R2. Perhaps R1-B-R2 to R1-D-R2 in general causes a shift of 0.3?

Its not quite as easy as this, because the molecular fragments aren'tso easily identified. A molecule might be described as "A-B-C", aswell as "E-Q-F" and "E-H" and "C-T(-P)-A", where "T" has threeR-groups connected to it.

Thanks

Thank to the EPAM LifeSciences for their Ketchertool, which I used for the structure depictions that weren't public domain on Wikipedia.

Andrew Dalke is an independent consultant focusing onsoftware development for computational chemistry and biology.Need contract programming, help, or training?Contact me

Molecular fragments, R-groups, and functional groups (5)
Copyright © 2001-2020 Andrew Dalke Scientific AB
Molecular fragments, R-groups, and functional groups (2024)
Top Articles
Latest Posts
Article information

Author: Terence Hammes MD

Last Updated:

Views: 6163

Rating: 4.9 / 5 (69 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Terence Hammes MD

Birthday: 1992-04-11

Address: Suite 408 9446 Mercy Mews, West Roxie, CT 04904

Phone: +50312511349175

Job: Product Consulting Liaison

Hobby: Jogging, Motor sports, Nordic skating, Jigsaw puzzles, Bird watching, Nordic skating, Sculpting

Introduction: My name is Terence Hammes MD, I am a inexpensive, energetic, jolly, faithful, cheerful, proud, rich person who loves writing and wants to share my knowledge and understanding with you.