基於Affymetrix芯片的基因錶達研究（導讀版）下載 mobi epub pdf 電子書 2026

簡體網頁||繁體網頁

☆☆☆☆☆

〔美〕Hinrich 著，欣裏希·約爾漫（Hinrich G·hlmann）編

圖書標籤:

基因錶達
Affymetrix芯片
生物芯片
基因組學
分子生物學
生物信息學
導讀
實驗技術
醫學研究
遺傳學

下載連結在頁面底部

facebook linkedin mastodon messenger pinterest reddit telegram twitter viber vkontakte whatsapp 複製連結

想要找書就要到圖書大百科

book.teaonline.club

立刻按 ctrl+D收藏本頁

你會得到大驚喜!!

齣版社：科學齣版社

ISBN：9787030329080

版次：1

商品編碼：10937129

包裝：精裝

開本：16開

齣版時間：2012-01-01

頁數：327

正文語種：英文

具體描述

內容簡介

　　Affymetrix GeneChip係統是目前應用廣泛的生物芯片平颱。但是由於Aflymetrix芯片含有超大量的信息，很多Affymetrix芯片用戶趨嚮於使用默認的分析設置，得到的常常不是極優化的結論。分子生物學傢和生物統計學傢根據十餘年的基因錶達譜實驗研究和數據分析的實踐經驗編寫瞭《基於Affymetrix芯片的基因錶達研究》，從理論概念到實驗結果，解釋瞭使用Affymetrix芯片進行基因錶達研究的全部過程，拆除瞭分子生物學、生物信息學和生物統計學之間無處不在的語言障礙。
　　本書專業實用，介紹瞭Affymetrix芯片的重要技術、統計學易犯的錯誤和問題，同時涉及其他芯片平颱的一般規則和應用。通過例證和全彩圖例，描述瞭技術和統計方法的概念，為初學者提供詳細指導。本領域的專傢則可以瞭解芯片所涉及的其他學科知識，拓展基因芯片錶達譜研究的認識。

目錄
附圖目錄
錶格目錄
BioBox目錄
StatsBox目錄
前言
縮寫詞和術語
1 生物學問題
1.1 為什麼進行基因錶達?
1.1.1 生物技術的進展
1.1.2 生物學相關的研究
1.2 研究問題
1.2.1 相關性和實驗研究對比
1.3 研究課題的主要類型
1.3.1 兩組間比較
1.3.2 多組間比較
1.3.3 不同治療方式間的比較
1.3.4 多組與對照組的比較
1.3.5 研究主題內的變化
1.3.6 分類和預測樣本
2 AffymetriX芯片技術
2.1 探針
2.2 探針組
2.2.1 標準探針組的定義
2.2.2 客戶可選擇的芯片描述文件(CDF)
2.3 芯片類型
2.3.1 標準錶達檢測芯片
2.3.2 外顯子芯片
2.3.3 基因芯片
2.3.4 疊瓦芯片
2.3.5 用於某項研究的專用芯片
2.4 標準實驗室芯片實驗流程
2.4.1 體外轉錄分析
2.4.2 全轉錄本正義鏈標記
2.5 AffymetriX芯片的數據質量
2.5.1 分析數據的重復性
2.5.2 分析數據的穩定性
2.5.3 分析的敏感性
3 實驗操作
3.1 生物學實驗
3.1.1 生物學背景
3.1.1.1 實驗目的/假設
3.1.1.2 技術平颱
3.1.1.3 mRNA水平的預期改變
3.1.2 樣本
3.1.2.1 選擇閤適的樣品/組織
3.1.2.2 樣本的類型
3.1.2.3 樣本的異質性
3.1.2.4 性彆
3.1.2.5 時間點
3.1.2.6 樣本切割引起的誤差
3.1.2.7 動物處理産生的誤差
3.1.2.8 RNA的質量
3.1.2.9 RNA的數量
3.1.3 預實驗
3.1.4 主實驗
3.1.4.1 對照實驗
3.1.4.2 實驗處理
3.1.4.3 分批實驗
3.1.4.4 隨機化
3.1.4.5 標準化
3.1.4.6 選擇對照
3.1.4.7 樣品量/重復次數/費用
3.1.4.8 平衡設計
3.1.4.9 對照樣本
3.1.4.10 樣本混閤
3.1.4.11 實驗記錄
3.1.5 實驗數據分析驗證
3.2 芯片實驗
3.2.1 外源RNA對照
3.2.2 靶基因閤成
3.2.3 批處理影響
3.2.4 全基因組芯片和用於某項研究的專用芯片比較
4 數據分析預處理
4.1 數據預處理
4.1.1 探針的信號強度
4.1.2 轉換為log2的對數
4.1.3 背景校正
4.1.4 歸一化
4.1.5 AffymetriX芯片概要
4.1.5.1 完全匹配(PM)和錯配(MM)技術
4.1.5.2 隻使用PM探針的技術
4.1.6 整體解決方案
4.1.7 信號檢測方法
4.1.7.1 芯片分析係統MAS 5.0
4.1.7.2 背景和雜交信號檢測(DABG)
4.1.7.3 檢齣/缺失比值(PANP)
4.1.8 標準化
4.2 質量控製
4.2.1 技術數據
4.2.2 虛擬圖像
4.2.3 重復性評價
4.2.3.1 重復性評價方法
4.2.3.2 實例分析
4.2.4 批處理效應
4.2.5 批處理效應校正
5 數據分析
5.1 為什麼我們需要統計學?
5.1.1 需要對數據作齣解釋
5.1.2 需要一個優秀的實驗設計
5.1.3 統計學與生物信息學比較
5.2 高維數據的問題
5.2.1 分析結果的重復性
5.2.2 數據挖掘和驗證
5.3 基因過濾
5.3.1 過濾方法
5.3.1.1 信號強度
5.3.1.2 兩樣品間變異
5.3.1.3 缺失/檢齣
5.3.1.4 含有效信息的/無有效信息的檢齣
5.3.2 數據過濾對檢驗和多重校正的影響
5.3.3 幾種過濾方法的比較
5.4 無監督數據分析
5.4.1 進行無監督分析的原因
5.4.1.1 批次影響
5.4.1.2 技術或生物學的偏差
5.4.1.3 錶型數據的質量校驗
5.4.1.4 共調控基因的識彆
5.4.2 聚類
5.4.2.1 距離和聯係
5.4.2.2 聚類算法
5.4.2.3 聚類質量校驗
5.4.3 多元投影方法
5.4.3.1 多元投影方法類型
5.4.3.2 基因和樣本關係圖
5.5 檢測差異錶達
5.5.1 復雜問題的簡單解決方法
5.5.2 統計檢驗
5.5.2.1 倍數變化
5.5.2.2 t-檢驗類型
5.5.2.3 由t統計到p值
5.5.2.4 方法比較
5.5.2.5 綫性模型
5.5.3 多重檢驗的校正
5.5.3.1 多重檢驗的問題
5.5.3.2 多重校正步驟
5.5.3.3 方法比較
5.5.3.4 事後比較
5.5.4 統計學意義與生物學相關性
5.5.5 樣本數量估計
5.6 有監督的預測
5.6.1 分類與假設檢驗
5.6.2 芯片分類的挑戰
5.6.2.1 過度擬閤
5.6.2.2 偏執方差平衡
5.6.2.3 交叉效驗
5.6.2.4 非唯一分類解決方案
5.6.3 位點選擇方法
5.6.4 分類方法
5.6.4.1 判彆分析
5.6.4.2 最近鄰分析法
5.6.4.3 邏輯(Logistic)迴歸
5.6.4.4 神經網絡
5.6.4.5 支持嚮量機
5.6.4.6 分類樹
5.6.4.7 集成方法
5.6.4.8 芯片預測分析(PAM)
5.6.4.9 方法比較
5.6.5 復雜的預測問題
5.6.5.1 多級問題
5.6.5.2 生存預測
5.6.6 樣本量
5.7 通路分析
5.7.1 通路分析的統計學方法
5.7.1.1 過錶達分析
5.7.1.2 功能分類評分
5.7.1.3 基因集分析
5.7.1.4 方法比較
5.7.2 數據庫
5.7.2.1 Gene ontology
5.7.2.2 京都基因與基因組百科全書(KEGG)
5.7.2.3 基因芯片通路分析(GenMAPP)
5.7.2.4 腺嘌呤富集元件數據庫(ARED)
5.7.2.5 概念圖(cMAP)
5.7.2.6 凋亡路徑圖(BioCarta)
5.7.2.7 染色體位置
5.8 其他分析方法
5.8.1 基因網絡分析
5.8.2 元分析
5.8.3 染色體位置
6 分析結果錶示
6.1 數據可視化
6.1.1 熱圖
6.1.2 強度圖
6.1.3 基因錶圖
6.1.4 維恩圖(Venn圖)
6.1.5 散點圖
6.1.5.1 火山圖(Volcano plot)
6.1.5.2 MA圖
6.1.5.3 高維數據的散點圖
6.1.6 柱狀圖
6.1.7 盒圖
6.1.8 小提琴圖錶
6.1.9 密度圖
6.1.10 樹狀圖
6.1.11 基因錶達通路
6.1.12 齣版用圖錶
6.2 生物學解釋
6.2.1 重要數據庫
6.2.1.1 Entrez Gene
6.2.1.2 AffymetriX網站(NetAffx)
6.2.1.3 OMIM
6.2.2 文獻挖掘
6.2.3 數據整閤
6.2.3.1 多種分子篩選數據
6.2.3.2 係統生物學
6.2.4 實時定量聚閤酶反應(RTqPCR)驗證
6.3 數據發錶
6.3.1 ArrayExpress
6.3.2 基因錶達文庫(GEO)
6.4 可重復性研究
7 藥物研發
7.1 早期標誌物的需求
7.2 關鍵路徑計劃
7.3 藥物發現
7.3.1 正常組織和病變組織的不同
7.3.2 疾病亞型的發現
7.3.3 分子靶標的識彆
7.3.4 分子特徵譜
7.3.5 疾病模型特徵
7.3.6 化閤物分析
7.3.7 劑量效應處理
7.4 藥物開發
7.4.1 生物標誌物
7.4.2 響應顯著性
7.4.3 毒理基因組學
7.5 臨床實驗
7.5.1 功能指標
7.5.2 結果預測的意義
8 使用R和Bioconductor
8.1 R和Bioconductor
8.2 R和Sweave(R語言的一種函數)
8.3 R和Eclipse(一種代碼)
8.4 自動芯片分析
8.4.1 裝載文件包
8.4.2 基因過濾
8.4.3 無監督探索
8.4.4 差異錶達檢驗
8.4.5 有監督分類
8.5 其他芯片分析軟件
9 未來前景
9.1 同時分析不同數據類型
9.2 未來的芯片
9.3 新一代(二代)測序:芯片的終結?
參考文獻
索引
附圖目錄
2.1 標準AffymetriX芯片圖
2.2 GC含量對信號強度的影響
2.3 同一探針集中的探針之間信號強度的差彆
2.4 使用客戶選擇的CDF時,探針集大小引起的差異
2.5 外顯子芯片和3′端芯片探針覆蓋範圍的比較
2.6 外顯子芯片的轉錄本注釋
3.1 性彆特異基因Xist(X染色體失活特異轉錄本)
3.2 樣本切割産生誤差示例
3.3 甲狀腺素在小鼠紋狀體的錶達
3.4 小鼠結腸樣本切割引起的誤差
3.5 降解與非降解RNA對比
3.6 RNA的降解圖顯示3′偏差
3.7 不同批次芯片的批間效果
4.1 芯片掃描圖像的一角
4.2 對數轉換的分配效應
4.3 芯片數據中的兩種噪音成分
4.4 歸一化對強度依賴變異的影響
4.5 歸一化對MA圖的影響
4.6 MAS 5.0背景計算
4.7 由affyPLM産生的虛擬圖像
4.8 兩重復關聯評估重復性
4.9 中心定位前後的成對一緻性
4.10 光譜圖評估重復性
4.11 由MAQC(生物芯片質量控製)得到的歸一化前AffymetriX數據的盒式圖
4.12 來自MAQC研究得到的AffymetriX芯片數據的SPM(譜圖)
4.13 存在批次效應的差異錶達基因的強度圖
5.1 信息豐富的和不提供信息的探針集的探針比較
5.2 基因過濾對p值分布的影響
5.3 不同過濾技術排除基因的百分比
5.4 兩種過濾技術的差異
5.5 基因過濾技術的分布差彆
5.6 在聚類中的歐幾裏得(Euclidean)和皮爾森(Pearson)距離
5.7 基於歐幾裏得和皮爾森距離的ALL數據的分級聚類
5.8 分級聚類運算的示意圖
5.9 k均值運算的示意圖
5.10 ALL數據的主要成分分析
5.11 ALL數據的譜圖
5.12 t-檢驗的可變性
5.13 t-檢驗
5.14 不良的t-檢驗:變異對顯著性的影響
5.15 Δ=0.75的SAM圖
5.16 t分布
5.17 使用大樣本資料比較兩種差異錶達檢驗的方法(30 vs.30)
5.18 使用小樣本資料比較兩種差異錶達檢驗的方法(3 vs.3)
5.19 各種交互效應的假設方案
5.20 用GLUCO數據中具有不同錶達方式的四個基因解釋交互效應
5.21 多種檢驗校正方法及其如何處理假陽性和假陰性
5.22 ALL數據組中調整過和未調整過的p值
5.23 高維性和過度擬閤在分離中的關聯
5.24 過度擬閤的問題
5.25 嵌套循環交叉驗證
5.26 利用PAM基因組閤秩次升高
5.27 利用LASSO基因組閤秩次升高
5.28 交叉驗證中的位點排列
5.29 進行分類的最佳基因數量
5.30 懲罰迴歸:懲罰的係數關聯
5.31 神經網絡方案
5.32 支持嚮量機模型的二維可視框圖
5.33 使用MLP包含高秩基因組的GO通路
5.34 利用GSA含有高秩基因組的GO通路
5.35 BioCarta通路
5.36 識彆差異錶達的染色體區域
6.1 熱圖
6.2 強度圖
6.3 基因列錶圖
6.4 Venn(維恩)圖
6.5 火山圖
6.6 MA圖
6.7 平滑散點圖
6.8 柱狀圖
6.9 數據組HD的盒圖
6.10 小提琴圖
6.11 密度圖
6.12 係統樹圖
6.13 重要基因組的GO通路
7.1 藥物開發中的基因錶達譜
7.2 Fos的劑量反應特徵
9.1 二代測序排序可能齣現的錯誤
錶格目錄
1.1 雙通道ANOVA設計
2.1 AffymetriX探針集的類型和名稱
2.2 已經不再使用的AffymetriX探針集和名稱
2.3 原始AffymetriX探針集的注釋級彆
2.4 産生客戶可選擇的CDF的規則
2.5 基於Ensembl Gene數據庫的HG U133 plus 2.0探針的使用
3.1 不同樣本的RNA産率
4.1 背景微小差異的影響
5.1 修正p值的計算
5.2 分類和假設檢驗
5.3 采用LASSO和PAM選擇的重要基因
5.4 懲罰迴歸:基因選擇
5.5 采用MLP選擇的重要基因
5.6 采用GSA選擇的前5個上調基因組和前5個下調基因組
BioBox目錄
1.1 基因錶達芯片
1.2 分子生物學的中心法則
1.3 siRNA
1.4 錶型
2.1 剪接變異
2.2 基因
3.1 Northern雜交
3.2 轉錄因子
3.3 血液
3.4 細胞培養
3.5 X染色體失活:Xist
3.6 凝膠電泳
3.7 生物分析儀進行RNA分析
3.8 RTqPCR(熒光定量PCR)
5.1 管傢基因
7.1 生物標誌物
7.2 EC50,ED50,IC50,LC50和LD50
7.3 生物標誌物和臨床意義
7.4 基因錶達的意義
9.1 錶觀遺傳學的實例:DNA甲基化
StatsBox目錄
1.1 關聯的兩種解釋
3.1 能力
4.1 準度和精度
4.2 貝葉斯統計
4.3 可重復性
4.4 關聯假設
5.1 參數,變量,統計
5.2 完全擬閤
5.3 有監督和無監督的研究
5.4 重取樣技術
5.5 神經網絡
5.6 多變量投影方法的步驟
5.7 確定差異錶達的步驟
5.8 比值的對數=對數差異
5.9 零假設和p值
5.10 變異,標準偏差和標準誤差
5.11 經驗貝葉斯方法
5.12 顯著性水平和能力
5.13 參數和非參數檢驗比較
5.14 Explanatory和響應變異
5.15 通用綫性模型
5.16 測量規模
5.17 交互反應
5.18 規則化或懲罰
5.19 敏感性和特異性
5.20 多重檢驗校正步驟
5.21 信息並不是越多越好
5.22 核心技術
5.23 刀切法和自助法

精彩書摘

Chapter 1

Biological
question

All
experimental
work
starts
in
principle
with a
question.
This
also
applies
to
the
field
of
molecular
biology. A
molecular
scientist
is
using a
certain
technique
to
answer a
specific
question
such
as,
“Does
the
cell
produce
more
of a
given
protein
when
treated
in a
certain
way?
”
Questions
in
molecular
biology
are
indeed
regularly
focused
on
specific
proteins
or
genes,
often
because
the
applied
technique
cannot
measure
more.

Gene
expression
studies
that
make
use
of
microarrays
also
start
with a
biological
question.
The
largest
difference
to
many
other
molecular
biology
approaches
is,
however,
the
type
of
question
that
is
being
asked.
Scientists
will
typically
not
run
arrays
to
find
out
whether
the
expression
of a
specific
messenger
RNA
is
altered
in a
certain
condition.
More
often
they
will
focus
their
question
on
the
treatment
or
the
condition
of
interest.
Centering
the
question
on a
biological
phenomenon
or a
treatment
has
the
advantage
of
allowing
the
researcher
to
discover
hitherto
unknown
alterations.
On
the
other
hand,
it
poses
the
problem
that
one
needs
to
define
when
an“interesting”
alteration
occurs.

1.1
Why
gene
expression?
1.1.1
Biotechnological
advancements
Research
evolves
and
advances
not
only
through
the
compilation
of
knowledge
but
also
through
the
development
of
new
technologies.
Traditionally,
researchers
were
able
to
measure
only a
relatively
small
number
of
genes
at a
time.
The
emergence
of
microarrays
(see
BioBox
1.1)
now
allows
scientists
to
analyze
the
expression
of
many
genes
in a
single
experiment
quickly
and
efficiently.

1.1.2
Biological
relevance
Living
organisms
contain
information
on
how
to
develop
its
form
and
structure
and
how
to
build
the
tools
that
are
responsible
for
all
biological
processes
that
need
to
be
carried
out
by
the
organism.
This
information ?
the
genetic

..........................................

Geneexpressionmicroarrays.Inmicroarrays,thousandstomillionsofprobesarefixedtoorsynthesizedonasolidsur-
face,beingeitherglassorasiliconchip.Thelatterexplainswhymicroarraysarealsooftenreferredtoaschips.Thetar-
getsoftheprobes,themRNAsamples,arelabelledwithfluo-
rescentdyesandarehybridizedtotheirmatchingprobes.Thehybridizationintensity,whichestimatestherelativeamountsofthetargettranscripts,canafterwardsbemeasuredbytheamountoffluorescentemissionontheirrespectivespots.Therearevariousmicroarrayplatformsdifferinginarrayfabrication,
thenatureandlengthoftheprobes,thenumberoffluorescentdyesthatarebeingused,etc.
BioBox
1.1:
Gene
expression
microarrays

content ?
is
encoded
in
information
units
referred
to
as
genes.
The
whole
set
of
genes
of
an
organism
is
referred
to
as
its
genome.

The
vast
majority
of
genomes
are
encoded
in
the
sequence
of
chemical
building
blocks
made
from
deoxyribonucleic
acid
(DNA)
and a
smaller
number
of
genomes
are
composed
of
ribonucleic
acid
(RNA)
,
e.g.
,
for
certain
types
of
viruses.
The
genetic
information
is
encoded
in a
specific
sequence
made
from
four
different
nucleotide
bases:
adenine,
cytosine,
guanine
and
thymine. A
slighlty
different
composition
of
building
blocks
is
present
in
mRNA
where
the
base
thymine
is
replaced
by
uracil.
Genetic
information
encoding
the
building
plan
for
proteins
is
transferred
from
DNA
to
mRNA
to
proteins.
The
gene
sequence
can
range
in
length
typically
between
hundreds
and
thousands
of
nucleotides
up
to
even
millions
of
bases.
The
number
of
genes
that
contain
protein-coding
information
is
expected
to
be
between
25,000
to
30,000
when
looking
at
the
human
genome. A
protein
is
made
by
constructing a
string
of
protein
building
blocks
(amino
acids)
.
The
order
of
the
amino
acids
in a
protein
matches
the
sequence
of
the
nucleotides
in
the
gene.
In
other
words,
messenger
RNA
interconnects
DNA
and
protein,
and
also
has
some
important
practical
advantages
compared
to
both
DNA
and
proteins
(see
BioBox
1.2)
.
Increasing
our
knowlegde
about
the
dynamics
of
the
genome
as
manifested
in
the
alterations
in
gene
expression
of a
cell
upon
treatment,
disease,
development
or
other
external
stimuli,
should
enable
us
to
transform
this
knowledge
into
better
tools
for
the
diagnosis
and
treatment
of
diseases.

DNA
is
made
of
two
strands
forming
together a
chemical
structure
that
is
called
“double
helix.
”
The
two
strands
are
connected
with
one
another
via
pairs
of
bases
that
form
hydrogen
bonds
between
both
strands.
Such
pairing
of
so-called
“complementary”
bases
occurs
only
between
certain
pairs.

..........

Centraldogmaofmolecularbiology.Thedogmaofmolec-
ularbiologyexplainshowtheinformationtobuildproteinsistransferredinlivingorganisms.Thegeneralflowofbiologicalinformation(greenarrows)hasthreemajorcomponents:(1)
DNAtoDNA(replication)occursinthecellnucleus(drawninyellow)priortocelldevision,(2)DNAtomRNA(transcrip-
tion)takesplacewheneverthecell(drawninlightred)needstomakeaprotein(drawnaschainofreddots),and(3)mRNAtoproteins(translation)istheactualproteinsynthesisstepintheribosomes(drawningreen).Besidesthesegeneraltransfersthatoccurnormallyinmostcells,therearealsosomespecialinformationtransfersthatareknowntooccurinsomevirusesorinalaboratoryexperimentalsetting.
BioBox
1.2:
Central
dogma
of
molecular
biology

..........................................

Hydrogen
bonds
can
be
formed
between
cytosine
and
guanine
or
between
adenine
and
thymine.
The
pairing
of
the
two
strands
occurs
in a
process
called
“hybridization.
”

Compared
to
DNA,
mRNA
is
more
dynamic
and
less
redundant.
The
information
that
is
encoded
in
the
DNA
is
made
available
for
processing
in a
step
called
“gene
expression”
or
“transcription.
”
Gene
expression
is a
highly
complex
and
tightly
regulated
process
by
which a
working
copy
of
the
original
sequence
information
is
made.
This
allows a
cell
to
respond
dynamically
both
to
environmental
stimuli
and
to
its
own
changing
needs,
while
DNA
is
relatively
invariable.
Furthermore,
as
mRNA
constitutes
only
the
expressed
part
of
the
DNA,
it
focuses
more
directly
on
processes
underlying
biological
activity.
This
filtering
is
convenient
as
the
functionality
of
most
DNA
sequences
is
irrelevant
for
the
study
at
hand.

Compared
to
proteins,
mRNA
is
much
more
measurable.
Proteins
are
3D
conglomerates
of
multiple
molecules
and
cannot
benefit
from
the
hybridising
nature
of
the
base
pairs
in
the
2D,
single
molecule,
structure
of
mRNA
and
DNA.
Furthermore,
proteins
are
very
unstable
due
to
denaturation,
and
cannot
be
preserved
even
with
very
laborious
methods
for
sample
extraction
and
storage.

When
using
microarrays
to
study
alterations
in
gene
expression,
people
normally
will
only
want
to
study
the
types
of
RNA
that
code
for
proteins

?
the
messenger
RNA
(mRNA)
.
It
is
however
important
to
keep
in
mind
that
RNAnot
only
contains
mRNA?acopyofa
section
of
the
genomic
DNA
carrying
the
information
of
how
to
build
proteins.
Besides
the
code
for
the
synthesis
of
ribosomal
RNA,
there
are
other
non-coding
genes
that,
e.g.
,
contain
information
for
the
synthesis
of
RNA
molecules.
These
RNAs
have
different
functions
that
range
from
enzymatic
activities
to
regulating
transcription
of
mRNAs
and
translation
of
mRNA
sequences
to
proteins.
The
numbers
of
these
functional
RNAs
that
are
encoded
in
the
genome
are
not
known.
Initial
studies
looking
at
the
overall
transcriptional
activity
along
the
DNA
are
predicting
that
the
number
will
most
likely
be
larger
than
the
number
of
protein-coding
genes.
People
used
to
say
that a
large
portion
of
the
genomic
information
encoded
in
the
DNA
are
useless
(“junk
DNA”)
.
Over
the
last
years
scientific
evidence
has
accumulated
that a
large
proportion
of
the
genome
is
being
transcribed
into
RNAs
of
which a
small
portion
constitutes
messenger
RNAs.
All
these
other
non-coding
RNAs
are
divided
into
two
main
groups
depending
on
their
size.
While
short
RNAs
are
defined
to
have
sizes
below
200
bases,
the
long
RNAs
are
thought
to
be
mere
precursors
for
the
generation
of
small
RNAs,
of
which
the
function
is
currently
still
unknown ?
in
contrast
to
the
known
small
RNAs
such
as
microRNAs
or
siRNAs[6]
(see
BioBox
1.3
for
an
overview
of
different
types
of
RNA)
.
Microarrays
are
also
being
made
to
study
differences
in
abundance
of
these
kinds
of
RNA.

..........

RNA.IncontrasttomRNA(messengerRNA)whichcontainstheinformationofhowtoassembleaprotein,therearealsodifferenttypesofnon-codingRNA(sometimesabbreviatedasncRNA)a.Herearethetypesthataremostrelevantinthecontextofthisbook:
miRNAinlength,whichregulategeneexpression.
longncRNA(longnon-codingRNA)arelongRNAmoleculesthatperformregulatoryroles.AnexampleisXIST,whichcanalsobeusedfordataqualitycontroltoidentifythegenderofasubject(seeBioBox3.5).
rRNA(ribosomalRNA)arelongRNAmoleculesthatmakeupthecentralcomponentoftheribosomeb.TheyareresponsiblefordecodingmRNAintoaminoacidsandareusedforRNAqualitycontrolpurposes(seeSection3.1.2.8).
siRNA(smallinterferingRNA)aresmalldouble-strandedRNAmoleculesofabout20-25nucleotidesinlengthandplayavarietyofrolesinbiology.ThemostcommonlyknownfunctionisaprocesscalledRNAinterference(RNAi).InthisprocesssiRNAsinterferewiththeex-
pressionofaspecificgene,leadingtoadownregulationofthesynthesisofnewproteinencodedbythatgenec.
tRNA(transferRNA)aresmallsingle-strandedRNAmoleculesofabout74-95nucleotidesinlenghts,whichtransferasingleaminoacidtoagrowingpolypeptidechainattheribosomalsiteofproteinsynthesis.EachtypeoftRNAmoleculecanbeattachedtoonlyonetypeofaminoacid.
aNon-codingRNAreferstoRNAmoleculesthataretranscribedfromDNAbutnottranslatedintoprotein.
bRibosomescanbeseenastheproteinmanufacturingmachineryofalllivingcells.
cThereare,however,alsoprocessesknownassmallRNA-inducedgeneactivationwherebydouble-strandedRNAstargetgenepromoterstoinducetranscriptionalactivationofassociatedgenes.
BioBox
1.3:
siRNA

..........................................
In
this
book
we
will
focus
on
studying
mRNA.
However,
most
likely
many
remarks
given
on
the
experimental
design
and
the
data
analysis
will
apply
to
the
study
of
small
RNA
as
well.

1.2
Research
question
The
key
to
optimal
data
analysis
lies
in a
clear
formulation
of
the
research
question.
Being
aware
of
having
to
define
what
one
considers
to
be a
“relevant”
finding
in
the
data
analysis
step
will
help
in
asking
the
right
question
and
in
designing
the
experiment
properly
so
that
the
question
can
really
be
answered. A
well-thought-out
and
focused
research
question
leads
directly
into
hypotheses,
which
are
both
testable
and
measurable
by
proposed
experiments.
Furthermore, a
well-formulated
hypothesis
helps
to
choose
the
most
appropriate
test
statistic
out
of
the
plethora
of
available
statistical
procedures
and
helps
to
set
up
the
design
of
the
study
in a
carefully
considered
manner.
To
formulate
the
right
question,
one
needs
to
disentangle
the
research
topic
into
testable
hypotheses
and
to
put
it
in a
wider
framework
to
reflect
on
potentially
confounding
factors.

Some
of
the
most
commonly
used
study
designs
in
microarray
research
will
be
introduced
here
by
means
of
real-life
examples.
For
each
type
of
study,
research
questions
are
formulated
and
example
datasets
described.
These
datasets
will
be
used
troughout
the
book
to
illustrate
some
technical
and
statistical
issues.

1.2.1
Correlational
vs.
experimental
research
Microarray
research
can
either
be
correlational
or
experimental.
In
correlational
research,
scientists
generally
do
not
apply a
treatment
or
stimulus
to
provoke
an
effect
on,
e.g.
,
gene
expression
(influence
variables)
,
but
measure
them
and
look
for
correlations
with
mRNA
(see
StatsBox
1.1)
. A
typical
example
are
cohort
studies,
where
individuals
of
populations
with
specific
characteristics
(like
diseased
patients
and
healthy
controls)
are
sampled
and
analysed.
In
experimental
research,
scientists
manipulate
certain
variables
(e.g.
,
apply a
compound
to a
cell
line)
and
then
measure
the
effects
of
this
manipulation
on
mRNA.
Experiments
are
designed
studies
where
individuals
are
assigned
to
specifically
chosen
conditions,
and
mRNA
is
afterwards
collected
and
compared.

It
is
important
to
comprehend
that
only
experimental
data
can
conclusively
demonstrate
causal
relations
between
variables.
For
example,
if
we
found
that a
certain
treatment A
affects
the
expression
levels
of
gene
X,
then
we
can
conclude
that
treatment A
influences
the
expression
of
gene
X.
Data
from

前言/序言

現代生物信息學與基因組學前沿探索書籍名稱：《現代生物信息學與基因組學前沿探索》內容提要：本書旨在為生物學、醫學及相關交叉學科的研究人員、高年級本科生和研究生提供一個全麵而深入的視角，聚焦於當前生物信息學和基因組學領域最前沿的技術、理論和應用。全書內容圍繞從基礎數據生成到復雜數據解讀的全過程展開，強調計算思維與生物學理解的深度融閤。第一部分：基因組測序技術的新範式本部分詳細梳理瞭新一代測序（NGS）技術的革命性進展，並展望瞭超長讀長測序技術（如PacBio和Oxford Nanopore）在解決復雜基因組組裝和結構變異檢測中的關鍵作用。第一章：下一代測序技術原理與數據質量控製本章首先迴顧瞭Sanger測序的局限性，隨後深入解析Illumina平颱的高通量原理，包括簇生成、邊閤成和圖像采集。重點討論瞭不同文庫構建策略（如全基因組重測序、RNA-Seq、ChIP-Seq）對下遊分析的影響。對於數據質量控製，本書提齣瞭嚴格的質量過濾標準，涵蓋堿基質量評分（Phred Score）、序列質量評估、接頭序列去除及GC含量分布分析，旨在確保分析的可靠性。第二章：長讀長測序與基因組組裝的挑戰隨著長讀長測序數據的普及，如何利用其解決從前難以攻剋的基因組重復區域和結構變異問題成為核心議題。本章詳細介紹瞭基於De Bruijn圖和Overlapping-Layout-Consensus（OLC）策略的組裝算法，並比較瞭不同工具（如SPAdes, Flye, Canu）在處理真核和原核物種基因組時的性能差異。特彆關注瞭從頭組裝（De Novo Assembly）與參考序列輔助組裝的流程優化。第二部分：轉錄組學的深度解析本部分聚焦於理解基因錶達的動態變化和調控機製，涵蓋瞭從傳統基於芯片的技術到更精細的單細胞水平分析的跨越。第三章：全基因組錶達譜分析的進階方法本章超越基礎的差異錶達基因（DEG）篩選，深入探討瞭配對樣本設計、批次效應（Batch Effect）的校正方法，以及如何利用主成分分析（PCA）和t-SNE/UMAP進行高維數據的可視化和降維。對於錶達定量，詳細闡述瞭RPKM/FPKM與TPM的適用場景差異，以及如何在復雜對照組設計中進行統計推斷。第四章：lncRNA與環狀RNA的鑒定與功能預測非編碼RNA在基因調控中扮演關鍵角色。本章係統介紹瞭從RNA-Seq數據中鑒定長鏈非編碼RNA（lncRNA）和環狀RNA（circRNA）的生物信息學流程，包括序列特徵（如基因長度、外顯子數量）和錶達模式的篩選。同時，探討瞭circRNA的“miRNA海綿”潛能及其功能富集分析的策略。第五章：單細胞RNA測序（scRNA-Seq）的數據處理與細胞異質性挖掘單細胞技術是當前生物學研究的熱點。本章詳盡闡述瞭scRNA-Seq數據的特有挑戰，如高稀疏性（Dropout Events）。內容涵蓋數據歸一化（如Log-normalization, SCTransform）、細胞周期迴歸、細胞類型注釋（基於Marker基因和參考數據集比對）的流程。重點介紹瞭軌跡推斷（Trajectory Inference）算法（如Monocle, Slingshot）在解析細胞分化路徑中的應用。第三部分：基因組變異與功能關聯本部分關注如何從海量的測序數據中識彆齣具有生物學意義的基因組和錶觀遺傳學變異。第六章：結構變異（SV）的檢測與注釋結構變異（缺失、插入、拷貝數變異、倒位等）是導緻疾病的重要因素。本章對比瞭基於配對末端信息（Paired-End）、讀段堆疊（Read Depth）和區間重排（Split Reads）的三大類SV檢測工具，並討論瞭如何將檢測到的SV與已知的基因組數據庫（如ClinVar, dbSNP）進行交叉驗證和臨床意義評估。第七章：錶觀遺傳學研究前沿：ChIP-Seq與ATAC-Seq 本章深入探討瞭染色質免疫共沉澱測序（ChIP-Seq）和染色質可及性測序（ATAC-Seq）的數據分析流程。對於ChIP-Seq，重點講解瞭峰值識彆算法（如MACS2）和富集區域的功能注釋。對於ATAC-Seq，闡述瞭如何評估染色質開放性，並結閤轉錄因子（TF）結閤位點預測，揭示轉錄調控網絡的動態變化。第八章：多組學數據整閤與係統生物學建模現代生物學研究日益依賴於整閤基因組學、轉錄組學、蛋白質組學等多源數據。本章介紹瞭互信息、典範相關分析（CCA）等數據整閤方法，旨在構建更全麵的分子網絡圖譜。同時，探討瞭如何利用網絡分析方法（如節點中心性、模塊化）來識彆疾病的關鍵驅動基因。結語：生物信息學研究的可持續發展本書最後強調瞭數據共享、標準化分析流程（Workflow Management Systems，如Snakemake, Nextflow）的重要性，並展望瞭人工智能與深度學習在基因組學數據解讀中的潛力，鼓勵讀者將計算能力轉化為實質性的生物學洞察。適用對象：分子生物學、遺傳學、生物化學、生物工程、生物信息學、生物醫學工程等相關專業的科研人員和學生。尤其適閤希望從傳統分子生物學實驗技術過渡到高通量測序數據分析的實踐者。

用戶評價

評分☆☆☆☆☆

我對基因錶達調控以及它是如何影響生物體性狀和疾病發生一直保持著濃厚的興趣。Affymetrix芯片，作為一種強大的基因錶達譜研究工具，我一直想深入瞭解它的技術細節和應用潛力。這本書的“導讀版”名稱，讓我覺得它非常適閤作為我進入這個領域的起點。我期待書中能夠詳細地介紹Affymetrix芯片的設計理念，包括其探針的設計策略、微陣列的構建過程，以及信號檢測的原理。更重要的是，我希望這本書能夠循序漸進地講解如何從芯片實驗獲得的數據中解讀基因錶達的規律，或許會涉及一些基礎的數據預處理和可視化方法。如果書中能提供一些典型的應用案例，例如在癌癥研究、發育生物學或藥物篩選中的應用，那將極大地提升我的理解和學習興趣。總而言之，我期待這本書能夠為我構建一個清晰的認識框架，讓我能夠理解Affymetrix芯片在基因錶達研究中的核心作用，並為我進一步深入學習相關知識打下堅實的基礎。

評分☆☆☆☆☆

當我在書店的架子上看到《基於Affymetrix芯片的基因錶達研究（導讀版）》這本書時，我的第一反應是：“太好瞭！終於有這樣一本易於理解的書瞭。”作為一個對生命科學充滿熱情，但又因為專業背景的限製，對基因錶達研究中的高深技術感到些許畏懼的讀者，我一直渴望能夠找到一本能夠帶領我入門的優秀讀物。Affymetrix芯片，我知道它在生物學研究中非常重要，能夠幫助我們瞭解基因的活性狀態，但具體的技術細節和數據分析過程，對我來說就像是一個巨大的謎團。這本書的“導讀版”定位，恰恰戳中瞭我的痛點，讓我相信它會用最簡潔、最清晰的方式，一步步地揭開Affymetrix芯片的神秘麵紗。我設想書中會包含大量生動的插圖和形象的比喻，來解釋芯片的製作原理、雜交的過程以及信號的解讀方法，讓我能夠像在遊覽一個科技博物館一樣，輕鬆地學習到這些復雜的知識。我期待這本書能夠成為我探索基因錶達世界的最佳夥伴，讓我能夠自信地理解那些曾經讓我望而卻步的專業術語，並為我開啓一個全新的科學視野。

評分☆☆☆☆☆

讀到這本書的標題，我腦海裏立刻浮現齣實驗室裏那些復雜的儀器和海量的數據。Affymetrix芯片，我有所耳聞，知道它在基因錶達研究中有著舉足輕重的地位，但具體是如何運作的，以及如何從那些閃爍的信號中提取齣有用的生物學信息，對此我一直處於一種模糊的狀態。這本書的“導讀版”形式，讓我看到瞭一個很好的切入點。我期望這本書能夠像一位經驗豐富的嚮導，引領我穿越技術和理論的迷宮。我設想書中會包含大量精美的圖錶，清晰地展示芯片的結構、雜交過程的每一個環節，以及數據采集的流程。更重要的是，我希望它能用淺顯易懂的語言，解釋芯片技術的核心優勢，比如高通量、高特異性等，以及它在應對不同生物學研究問題時所展現齣的強大能力。從基礎的芯片設計到初步的數據解讀，我希望這本書能夠為我勾勒齣一幅完整的基因錶達研究的藍圖，讓我能夠初步理解這項技術在現代生命科學研究中的價值和地位。

評分☆☆☆☆☆

這本書的標題著實吸引瞭我，尤其是“導讀版”這三個字，讓我這個對基因錶達研究領域還不是特彆熟悉，但又渴望深入瞭解的讀者，看到瞭希望。Affymetrix芯片，這個在生物信息學領域響當當的名號，我知道它在基因錶達譜的獲取上扮演著至關重要的角色，但具體它是如何工作的，背後的原理是什麼，又是如何將微觀的基因信號轉化為可分析的數據，這一切對我來說都籠罩著一層神秘的麵紗。這本書的齣現，仿佛就是為我量身定做的引路人，它承諾將復雜的技術細節，通過“導讀”的方式，以一種易於理解和接受的方式呈現齣來。我設想，書中一定會有清晰的圖示和通俗的語言，來解釋芯片的設計理念、雜交過程以及信號檢測的原理，讓我能夠快速建立起對Affymetrix技術基礎的認知。更重要的是，作為一本“導讀版”，我期望它能為我打開一扇通往更廣闊基因錶達研究領域的大門，讓我瞭解這項技術在不同生物學問題中的應用，例如疾病診斷、藥物研發、物種進化研究等等，從而激發我對這個領域的進一步探索熱情。我期待它能夠填補我在這一領域的知識空白，讓我不再對那些專業術語感到望而卻步，而是能自信地參與到相關的討論和學習中。

評分☆☆☆☆☆

我一直對分子生物學和生物信息學交叉領域的研究充滿好奇，特彆是基因錶達這一核心概念。Affymetrix芯片作為基因錶達研究的重要工具，其背後蘊含的精密技術和數據分析方法，讓我倍感著迷。我猜想這本書的“導讀版”定位，意味著它會從基礎知識講起，循序漸進地帶領讀者認識Affymetrix芯片的獨特之處。我期待書中能夠詳細闡述芯片的工作機製，比如探針的設計原理，如何有效地捕獲和檢測目標RNA分子，以及最終如何將這些信號轉化為數字化的錶達量。更讓我感興趣的是，這本書是否會涉及基因錶達數據分析的初步步驟，例如數據預處理、質量控製，以及一些基礎的統計學方法，用來解讀這些海量的數據。如果能有一些實際的案例分析，哪怕是簡化的示例，都會讓我對理論知識有更直觀的理解。我希望這本書能夠在我心中種下一顆種子，讓我能夠理解基因錶達的“是什麼”和“為什麼”，為我將來深入學習更復雜的分析技術和應用打下堅實的基礎。這不僅僅是關於一個技術的學習，更是一個認識生命奧秘新視角的確立。

評分☆☆☆☆☆

到貨速度那叫yigw快，哈哈哈，快

評分☆☆☆☆☆

到貨速度那叫yigw快，哈哈哈，快

評分☆☆☆☆☆

注意是導讀版，基本上就是外文書。

評分☆☆☆☆☆

到貨及時

評分☆☆☆☆☆

同事要的，強烈推薦，介紹很詳細

評分☆☆☆☆☆

內容對affy芯片的介紹還是有些用