User login

Credit: NIH
A new tool may help researchers sift through the scientific literature to discover hypothesis-generating information relevant to their own research.
The resource, called the Knowledge Integration Toolkit (KnIT), extracts relevant information from the literature, includes it in a network that can be queried, and
then attempts to use these data to generate reasonable and testable hypotheses that can help direct lab studies.
Researchers tested KnIT in a retrospective case study involving published data on p53 and found the tool could accurately predict the existence of proteins that modify p53.
Details from this study were published in the Association for Computing Machinery’s digital library.
Olivier Lichtarge, MD, PhD, of the Baylor College of Medicine in Houston, Texas, is scheduled to discuss the study on August 27 at the 20th Annual Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining Conference in New York, New York.
“On average, a scientist might read between 1 and 5 research papers on a good day,” Dr Lichtarge noted.
“But, to put this in perspective with p53, there are over 70,000 papers published on this protein. Even if a scientist reads 5 papers a day, it could take nearly 38 years to completely understand all of the research already available today on this protein.”
Scientists formulate hypotheses based on what they read and know, but because they cannot read everything, their hypotheses may be biased, according to Dr Lichtarge.
“A computer certainly may not reason as well as a scientist,” he said, “but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”
With that in mind, Dr Lichtarge and his colleagues initiated a project to develop a knowledge integration tool that took advantage of existing text mining capabilities, such as those used by IBM’s Watson technology—cognitive technology that processes information more like a human than a computer.
And the team came up with KnIT. In the first test using KnIT, they sought to identify new protein kinases that phosphorylate p53.
There are more than 500 known human kinases and tens of thousands of possible proteins they can target. Thirty-three are currently known to modify p53.
The researchers used KnIT to mine the scientific literature up to 2003, when only half of the 33 phosphorylating protein kinases had been discovered.
Seventy-four kinases were extracted as potential modifiers. Of these, prior to 2003, 10 were known to phosphorylate p53, and 9 were discovered at a later date.
Of the 10 already known, KnIT accounted for them in reasoning as well as ranking the likelihood that the other 64 kinases targeted p53. Of the 9 found nearly a decade later, KnIT accurately predicted 7.
“This study showed that, in a very narrow field of study regarding p53, we can, in fact, suggest new relationships and new functions associated with p53, which can later be directly validated in the laboratory,” Dr Lichtarge said.
“Our long-term hope is to systematically extract knowledge directly from the totality of the public medical literature. For this, we need technological advances to read text, extract facts from every sentence, and to integrate this information into a network that describes the relationship between all of the objects and entities discussed in the literature.”
“This first study is promising, because it suggests a proof of principle for a small step towards this type of knowledge discovery. With more research, we hope to get closer to clinical and therapeutic applications.” ![]()

Credit: NIH
A new tool may help researchers sift through the scientific literature to discover hypothesis-generating information relevant to their own research.
The resource, called the Knowledge Integration Toolkit (KnIT), extracts relevant information from the literature, includes it in a network that can be queried, and
then attempts to use these data to generate reasonable and testable hypotheses that can help direct lab studies.
Researchers tested KnIT in a retrospective case study involving published data on p53 and found the tool could accurately predict the existence of proteins that modify p53.
Details from this study were published in the Association for Computing Machinery’s digital library.
Olivier Lichtarge, MD, PhD, of the Baylor College of Medicine in Houston, Texas, is scheduled to discuss the study on August 27 at the 20th Annual Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining Conference in New York, New York.
“On average, a scientist might read between 1 and 5 research papers on a good day,” Dr Lichtarge noted.
“But, to put this in perspective with p53, there are over 70,000 papers published on this protein. Even if a scientist reads 5 papers a day, it could take nearly 38 years to completely understand all of the research already available today on this protein.”
Scientists formulate hypotheses based on what they read and know, but because they cannot read everything, their hypotheses may be biased, according to Dr Lichtarge.
“A computer certainly may not reason as well as a scientist,” he said, “but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”
With that in mind, Dr Lichtarge and his colleagues initiated a project to develop a knowledge integration tool that took advantage of existing text mining capabilities, such as those used by IBM’s Watson technology—cognitive technology that processes information more like a human than a computer.
And the team came up with KnIT. In the first test using KnIT, they sought to identify new protein kinases that phosphorylate p53.
There are more than 500 known human kinases and tens of thousands of possible proteins they can target. Thirty-three are currently known to modify p53.
The researchers used KnIT to mine the scientific literature up to 2003, when only half of the 33 phosphorylating protein kinases had been discovered.
Seventy-four kinases were extracted as potential modifiers. Of these, prior to 2003, 10 were known to phosphorylate p53, and 9 were discovered at a later date.
Of the 10 already known, KnIT accounted for them in reasoning as well as ranking the likelihood that the other 64 kinases targeted p53. Of the 9 found nearly a decade later, KnIT accurately predicted 7.
“This study showed that, in a very narrow field of study regarding p53, we can, in fact, suggest new relationships and new functions associated with p53, which can later be directly validated in the laboratory,” Dr Lichtarge said.
“Our long-term hope is to systematically extract knowledge directly from the totality of the public medical literature. For this, we need technological advances to read text, extract facts from every sentence, and to integrate this information into a network that describes the relationship between all of the objects and entities discussed in the literature.”
“This first study is promising, because it suggests a proof of principle for a small step towards this type of knowledge discovery. With more research, we hope to get closer to clinical and therapeutic applications.” ![]()

Credit: NIH
A new tool may help researchers sift through the scientific literature to discover hypothesis-generating information relevant to their own research.
The resource, called the Knowledge Integration Toolkit (KnIT), extracts relevant information from the literature, includes it in a network that can be queried, and
then attempts to use these data to generate reasonable and testable hypotheses that can help direct lab studies.
Researchers tested KnIT in a retrospective case study involving published data on p53 and found the tool could accurately predict the existence of proteins that modify p53.
Details from this study were published in the Association for Computing Machinery’s digital library.
Olivier Lichtarge, MD, PhD, of the Baylor College of Medicine in Houston, Texas, is scheduled to discuss the study on August 27 at the 20th Annual Association for Computing Machinery’s Special Interest Group on Knowledge Discovery and Data Mining Conference in New York, New York.
“On average, a scientist might read between 1 and 5 research papers on a good day,” Dr Lichtarge noted.
“But, to put this in perspective with p53, there are over 70,000 papers published on this protein. Even if a scientist reads 5 papers a day, it could take nearly 38 years to completely understand all of the research already available today on this protein.”
Scientists formulate hypotheses based on what they read and know, but because they cannot read everything, their hypotheses may be biased, according to Dr Lichtarge.
“A computer certainly may not reason as well as a scientist,” he said, “but the little it can, logically and objectively, may contribute greatly when applied to our entire body of knowledge.”
With that in mind, Dr Lichtarge and his colleagues initiated a project to develop a knowledge integration tool that took advantage of existing text mining capabilities, such as those used by IBM’s Watson technology—cognitive technology that processes information more like a human than a computer.
And the team came up with KnIT. In the first test using KnIT, they sought to identify new protein kinases that phosphorylate p53.
There are more than 500 known human kinases and tens of thousands of possible proteins they can target. Thirty-three are currently known to modify p53.
The researchers used KnIT to mine the scientific literature up to 2003, when only half of the 33 phosphorylating protein kinases had been discovered.
Seventy-four kinases were extracted as potential modifiers. Of these, prior to 2003, 10 were known to phosphorylate p53, and 9 were discovered at a later date.
Of the 10 already known, KnIT accounted for them in reasoning as well as ranking the likelihood that the other 64 kinases targeted p53. Of the 9 found nearly a decade later, KnIT accurately predicted 7.
“This study showed that, in a very narrow field of study regarding p53, we can, in fact, suggest new relationships and new functions associated with p53, which can later be directly validated in the laboratory,” Dr Lichtarge said.
“Our long-term hope is to systematically extract knowledge directly from the totality of the public medical literature. For this, we need technological advances to read text, extract facts from every sentence, and to integrate this information into a network that describes the relationship between all of the objects and entities discussed in the literature.”
“This first study is promising, because it suggests a proof of principle for a small step towards this type of knowledge discovery. With more research, we hope to get closer to clinical and therapeutic applications.” ![]()