文件名称:
Siamese Recurrent Architectures for Learning Sentence Similarity.pdf
开发工具:
文件大小: 1mb
下载次数: 0
上传时间: 2019-10-14
详细说明:用最简单的模型、最简单的特征工程做出好效果,追求的就是极致性价比。如果有需要,可以在此基础上做一些模型更改和特征工程,提高表现效果。ture for face verification developed by Chopra, Hadsell, and This forces the LSTm to entirely capture the semantic dif-
LeCun(2005), which utilizes symmetric Conv Nets where ferences during training, rather than supplementing the rnN
we use LSTMs Siamese neural networks been proposed with a more complex learner that can help resolve shortcom-
for a number of metric learning tasks(Yih et al. 2011
ings in the learned representations as done by Kiros et al
Chen and Salman 2011), but to our knowledge, recurrent
(2015)and Tai, Socher, and Manning(2015)
connections remain largely unexplored in this context
As Chopra, Hadsell, and LeCun(2005) point out, using a
l2 rather than l1 norm in the similarity function can lead to
Manhattan lstm model
undesirable plateaus in the overall objective function. This
The proposed Manhattan LSTM (MaLSTM) model is out
is because during early stages of training, a l2-based model
lined in Figure 1. There are two networks lsTMa and
is unable to correct errors where it erroneously believes se
LSTMb which each process one of the sentences in a given
matically different sentences to be nearly identical due to
pair, but we solely focus on siamese architectures with tied
vanishing gradients of the Euclidean distance. Empirically
weights such that LSTMa=LS, in this work. Neverthe
our results are fairly stable across various types of simple
less, the general untied version of this model may be more
similarity function, but we find that g utilizing the manhat
useful for applications with asymmetric domains such as
tan distance slightly outperforms other reasonable alterna
information retrieval( where search queries are stylistically
tives such as cosine similarity (used in Yih et al. 2011)
distinct from stored documents)
Semantic relatedness scoring
The SiCK data contains 9927 sentence pairs with a
5,000/4, 927 training/test split(Marelliet al. 2014). Each pair
exp(-1/
is annotated with a relatedness label E [1, 5 corresponding
the average relatedness judged by 10 different individu-
als. Although their skip-thoughts rnn is trained on a vast
LSTM
LSTMb
corpus for two weeks, Kiros et al. (2015) point out that it is
unable to distinguish between many of the test-set sentences
shown in Table 1, highlighting the difficulty of this task
Sentence pair
G S
M
a little girl is looking at a woman in costume
He
Is smart.
A
truly
硎 Ise man.
a young girl is looking at a woman in
costume
4.74.54
Figure 1: Our model uses an LSTM to read in word-vectors
a person is performing tricks on a motorcycle
representing each input sentence and employs its final hid-
The performer is tricking a person on a
motorcycle
264.42.9
den state as a vector representation for each sentence. Subse
Somconc is pouring ingredients into a pot.
quently, the similarity between these representations is used
A man is removing vegetables from a pot. 2. 4 3.6 2.5
as a predictor of semantic similarity
Nobody is pouring ingredients into a pot
Someone is pouring ingredients into a pot. 3. 5 4.2 3.7
The LSTM learns a mapping from the space of vari-
able length sequences of dim-dimensional vectors into Rdrep Table 1: Example sentence pairs from the SICK test data. G
(din=300, drep=50 in this work). More concretely, denotes ground truth relatedness E [1, 5),S= skip-thought
ch sentence (represented as a sequence of word vectors
predictions, and M= MaLSTM predictions
T1,...,mT, is passed to the LSTM, which updates its hidden
state at each sequence-index via equations(2)-(7). The final
To enable our model to generalize beyond the limited vO-
representation of the sentence is encoded by hT E Rdrep
abulary present in the SiCk training set, we provide the
the last hidden state of the model. For a given pair of sen
LSTM With inputs that reflect relationships between words
tences, our approach applies a pre-defined similarity func
beyond what can be inferred from the small number of train
tion g: Rdrep x rdren ,r to their LSTM-representations. ing sentences. LSTMS typically require large datasets to
Similarities in the representation space are subsequently
achieve good generalization due to their vast numbers of pa-
used to infer the sentences underlying semantic similarity
rameters, and we thus augment our dataset with numerous
Note that unlike typical language modeling RNNs, which
additional training examples, a common practice in SemEval
are used to predict the next word given the previous text, systems(Marelli et al. 2014)as well as high-performing
our LSTMS simply function like the encoder of sutskever,
neural networks. Like many top performing semantic simi
Vinyals, and Le (2014). Thus, the sole error signal backprop-
larity systems, our LSTM takes as input word-vectors which
agated during training stems from the similarity between
have been pre-trained on an external corpus. We use the
sentence representations hr, hTt, and how this predicted
300-dimensional word2vec embeddings which Mikolov et
similarity deviates from the human annotated ground truth
al (2013)demonstrate can capture intricate inter-word rela
relatedness. We restrict ourselves to the simple similarity
tionships such as vec(king)-vec(man)+ vec( woman)A
function g(h(Ta), hT)=exp(-l71a)-hrl1)E[O, 1
Publiclyavailableat:code.google.com/p/word2vec
2788
vec(queen). We encourage invariance to precise wording
Results
and expand our dataset by employing thesaurus-based au
mentation in which 10,022 additional training examples are
The Mals tM is able to accurately score the table 1 exam-
generated by replacing random words with one of their syn-
ples which Kiros et al. highlight as difficult for their skip
onyms found in Wordnet (Miller 1995). A similar strategy
thoughts model. Despite being calibrated for MsE, our ap
is also successfully adopted by Zhang, Zhao, and Le cun
proach performs better than existing systems for the seman
(2015). Unlike the SemEval 2014 submissions, our methods
tic relatedness task across all three evaluation metrics(see
do not require extensive manual feature generation beyond
Table 2). Note that because all results shown in Table 2 rely
the separately trained word2vec vectors
on additional feature generation(e.g. dependency parses
The MalstM predicts relatedness for a given pair o
or data augmentation schemes, this is only an evaluation
of complete relatedness-scoring systems rather than a fair
sentences via g(hr,hT), and we train the siamese net-
comparison of the different learning algorithms employed
work using backpropagation-through-time under the mean-
Nonetheless, we perform ablation experiments to better un-
squared-error (MSE) loss function (after rescaling the derstand our methods finding that the Pearson-correlation
training-set relatedness labels to lie E 0, 1). SemEval
the primary SemEval performance metric) of our approach
evaluates predicted similarities against the given human
worsens by: 0.01 without regression calibration, 0.02 with
annotated similarities on three metrics pearson correlation
out pre-training, and 0.04 without synonym augmentation
Spearman correlation, and MsE. Due to the simple con- Due to the limited available training data, we do not realize
struction of our similarity function, the predictions of our performance gains by switching to multi-layer or bidirec-
model are constrained to follow the exp(-ac)curve and are
tional stms
thus not suited for these evaluation metrics. After training
our model, we apply an additional nonparametric regression
step to obtain better-calibrated predictions(with respect to
Method
MSE
MSE). Over the training set, the given labels(under original
Illinois-LH
0.79930.75380.3692
1,5 scale)are regressed against the univariate MaLSTM
ai and Hockenmaier 2014)
g-predicted relatedness as the sole covariate, and the fitted
UNAL-NLP
0.80700.74890.3550
regression function is evaluated on the mal sTM-predicted
Gimenez et al. 2014)
relatedness of the test pairs to produce adjusted final predic
Meaning Factory
0.82680.77210.3224
tions. We use the classical local-linear estimator discussed
(Bjerva et al. 2014)
in Fan and Gijbels(1992) with bandwidth selected using
ECNU
0.8414
leave-one-out cross-validation. This calibration step serves
(Zhao, Zhu, and Lan 2014)
as a minor correction for our restrictively simple similarity
function(which is necessary to retain interpretability of the
Skip-thought+coco
0.86550.79950.2561
sentence representations)
(Kiros et al. 2015)
Dependency Tree-LSTM 0.8676 0.8083 0.2532
(Tai, Socher, and Manning 2015
Training details
Cony Net
0.86860.80470.2606
(He, Gimpel, and Lin 2015)
Our LStM uses 50-dimensional hidden representations ht
MalSTM
088220834502286
and memory cells Ct. Optimization of the parameters is
done using the Adadelta method of Zeiler (2012)along
Table 2: Test set Pearson correlation(r), Spearmans p, and
with gradient clipping(rescaling gradients whose norm ex-
mean squared error for the sick semantic textual similarity
ceeds a threshold) to avoid the exploding gradients problem
task. The first group of results are top SemEval 2014 sub
(Pascanu, Mikolov, and Bengio 2013). We employ earl
missions and the second group are recent neural network
stopping based on a validation set containing 30%o of the
methods(best result from each paper shown)
training examples
It is well-known that the success of LSTMs depends cru
In Table 3. Tai Socher and Manning show the most sim-
cially on their initialization, and often parameters transferred
llar test-set examples found by their Tree-LSTM for three
from neural networks trained for a different task can serve
given sentences as well as its inferred similarity scores. We
as a strong starting point for the optimization(c f. Ben-
apply our model to these same examples, determining that
gio 2012). We first initialize our Lstm weights with small
while the sequential MaLSTM is slightly worse at identify
random Gaussian entries(and a separate large value of 2.5
ing active-passive equivalence, our approach is better at dis-
for the forget gate bias to facilitate modeling of long range tinguishing verbs and objects than the compositional Tree-
dependence). Then, our MalsTM is (pre)trained as pre-
LSTM which often infers seemingly over-estimated relat
viously described on separate sentence-pair data provided
edness scores in Table 3. For example, the ground truth la-
for the earlier Sem Eval 2013 Semantic Textual Similarity beling between"Tofu is being sliced by a woman"and"A
task(Agirre and Cer 2013). The weights resulting from this
woman is slicing butter"is only 2.7 in the sick test set
pre-training thus form our starting point for the sick data, (and substituting"potatoes"for "butter"should not greatl
which is markedly superior to a random initialization
increase relatedness between the two statements
2789
Ranking by dependency Tree-LSTM Model Tree M
23
56
a woman is slicing potatoes
-0.335
a woman Is cutting potatoes
4.824.87
re is no man pointing at a car
potatoes arc bcing sliced by a woman
4.704.38
2 The woman is not playing the fluto
tofu is bcing sliced by a woman
4.393.51
3 The man is not riding a horse
4 A man is pointing at a silver sedan
a boy is waving at some young runners from
5 The woman is playing the flute
the ocean
6 A man is riding a horse
a group of men is playing with a ball on the 3.79 3.13
910
a young boy wearing a red swimsuit is jumping 3.37 3.48
1 Two kids are bouncing on colorful balls
out of a hlue kiddies pool
2 Two children are bouncing on colorful balls
the man is tossing a kid into the swimming pool 3. 19 2.26
3 The golden dog is running through a field of tall grass
that is near the ocean
4 A brown dog is running through tall green grass
two men are playing guitar
5 A woman is putting on makeup carefully
6 A woman is carefully removing her makeup
the man is singing and playing the guitar
4.0
he man is opening the guitar for donations and 4.01
230
7 A woman is applying cosmetics to her eyelid
8 A woman is carefully applying cosmetics to her eyelid
lays with the case
9 There is no woman cutting potatoes
two men arc dancing and singing in front of a 4.00 2.33
10 A woman is slicing carrots
crow
2
3A7,89191112
Table 3: Most similar sentences(from 1000-sentence sub
0.14
sample)in the SiCK test data according to the Tree-LSTM
1 The cat is running across the grave
2 A cat is playing a keyboard
Tree /M denote relatedness (with the sentence preceding
3 The brown animal is jumping in the air
each group) predicted by the Tree-LSTM/ MaLSTM
4 The animal with big eyes is eating
5 A dog is bouncing on a trampoline
6 A dog is running on the ground
7 A dog is running on the road
8 Several boys are jumping on a trampoline
Sentence representations
9 A little boy is running on the ground and playing with a little girl
10 Someone is playing a piano
11 A man is running on the road
We now investigate the geometry of the sentence
12 A man is playing an electronic keyboard
representation-space learned by the malstM network. As
the l1 metric is the sum of element-wise differences, we hy-
Figure 2: MaLSTM representations of test set sentences de-
pothesize that by using specific hidden units (i.e. dimensions picted along three different dimensions of hr(indices 1, 2
of the sentence representation) to encode particular charac
Each number above the axis corresponds to a sen-
teristics of a sentence. the trained malstM infers seman
tence representation and its location represents the value this
tic similarity between sentences by simply aggregating their particular hidden unit assigns to the sentence(shown below)
differences in various characteristics
Some examples supporting this idea are shown in Fig
ure 2, which depicts the values that particular sentences take
along specific dimensions of hr. It is evident that the hidden
duction like t-SNE (van der Maaten and Hinton 2008),we
unit shown at the top has learned to detect negation, separat-
can simply use principal components analysis(PCa) for in
ing sentences containing words like"no"or"not"from the
formative visualization of the MaLSTM representations due
rest, regardless of the other content in the text The hidden
to their simple structure
unit in the middle plot is particularly sensitive to catego
musIc
rization of the direct objects, separating sentences describ-
culinary themes
ing actions on balls, grass, cosmetics, and vegetables. The
animals
hidden unit depicted at the bottom of Figure 2 clearly sep
Waler environments
arates sentences based on their subject, imposing an inter
●vio|ence
esting ordering that reflects broader similarity between the
subject categories: cats, general animals, dogs, boys, gen
eral people(someone), and men. Unlike the ConvNet of He
Gimpel, and lin which measures similarity explicitly across
multiple scales and locations in the sentence, the delineation
9°
of these underlying characteristics emerges naturally in the
MaLSTM representations, which are guided solely by the l1
metric and overall semantic similarity labels
Next. we shift our attention from local characteristics of
different hidden units toward the global geometry of the sen
ence representation space. Due to our training criterion, this
space is naturally endowed with the e metric and avoids
being highly warped. While analysis of neural network rep-
Figure 3: MaLSTM representations for all sentences fro
resentations typically requires nonlinear dimensionality re-
the siCK test set, projected onto two principal components
2790
Figure 3 depicts an overview of the SiCk dataset from the
semantic relatedness scoring(with no supervised informa
perspective of the Malstm model(after PCa dimension tion regarding contradictions or the neutral threshold), they
reduction For interpretability, we color many of the sen
capture enough relevant characteristics of the sentences to
tences based on distinct concepts/themes under which they
be highly useful for entailment-classification. In contrast to
fall. The geometric coherence of the sentences in the repre
he malsTM representations, the Illinois- LH syStem em
sentation space exists across numerous categories: from sen-
ploys many features specially constructed for this task such
tences about animals (ranging from cats to lemurs), culinary as hypernym counts and occurrences of no" and"not". In-
themes (like slicing vegetables), music (like guitar playing), terestingly, a useful feature like no"-occurrence, which Lai
water environments(e.g. the ocean or swimming pools),
and Hockenmaier manually selected, has been automaticall
etc. In fact, the sentence representations cluster along nearly
learned by our model and is encoded by the first hidden unit
all additional meaningful semantic categorizations we could
shown in Figure 2
come up with (not depicted due to coloring constraints)
One peculiar aspect of this representations space is the
Method
Accuracy
low-density region that separates the culinary themed exam
ples from the other sentences. Around this area, there are nu
Illinois -LH
84.6
merous violence and gun-related sentences in the representa
(Lai and Hockenmaier 2014)
tions, for example: A man is fixing a silencer to a gun".We
ECNU
83.6
find that these violent texts are likely to receive much lower
(zhao, Zhu, and Lan 2014)
similarity scores when paired with more mundane sentences
UNAL-NLP
83.1
typically found in SiCK(the average violent-nonviolent pair
Gimenez et al. 2014)
only has similarity 1.88 compared with an average of 3.41
Meaning Factory
81.6
for all test-set pairs). This explains why the MaLSTM rep
(Bjerva et al. 2014)
resentations have learned to become sparse in the vicinity of
Reasoning-based n-best
80.4
these violent examples(depicted in red in Figure 3)
Thus, Figure 3 shows that human-determined semantic re-
(Lien and Kouylekov 2015)
Lang Pro hybrid-800
81.4
latedness heavily depends on the occurrence of such themes
(Abzianidze 2015)
These discoveries about the sick dataset are enabled by the
SNLI-transfer 3-claSS lstm
80.8
interpretability of the MalSTM representations, unlike the
(Bowman et al. 2015)
other proposed neural networks which rely on complex oper-
ations over their learned representations. In addition to pro
MalstM features SVM
84.2
viding model insight, informative representations can pro
vide a useful tool for exploratory data analysis
Table 4: Test set accuracy for the sick semantic entail-
ment classification. The first gre
Entailment classification
mEval 2014 submissions and the second are more recently
proposed methods
To evaluate the broader utility of our sentence representa
tions, we leverage them for a different application: the se
mEval 201 4 textual entailment task (Marelli et al. 2014).In
D
iScussion
addition to the relatedness scores each of the sick sen-
tence pairs has also been labeled as one of three classes
This work demonstrates that a simple lstm is capa
entailment, contradiction, or neutral, which are to be pre
ble of modeling complex semantics if the representations
dicted for the test examples. For this task, we solely rely on
are explicitly guided. Leveraging synonym augmentation
the same representations learned for predicting semantic re
and pretrained word-embeddings, we circumvent the size
latedness(fixed without additional fine-tuning), and simply
limitations of existing labeled datasets. Analvsis of the
apply standard learning methods to do the entailment classi
learned model reveals that it utilizes diverse hidden units to
fication
encode different characteristics of each sentence. Admitting
Specifically, from the MaLSTM representations hTa,, B(b)
efficient test-time inference our model can be deployed in
of each pair of sentences, we compute the following simple
real-time applications. Not only useful for scoring semantic
features(also successfully used by Tai, Socher, and man
relatedness/entailment, trained malSTM sentence represen
ning 2015): element-wise(absolute) differences InTa hle
tations can produce interesting insights in exploratory data
analysis thanks to their interpretable structure
and element-wise products //. USing only these fea-
Since our approach relies on pre-trained word-vectors as
tures. we train a radial-basis-kernel svm to classify the en-
the lstm inputs, it will benefit from improvements in word-
tailment labels. The one-versus-all approach to multi-class
embedding methods such as those of li et al. (2015), espe
problems is employed with hyperparameters optimized in 5
cially as these word-vectors more comprehensively capture
fold cross-validation
synonymity and entity-relationships. We also foresee signif-
Table 4 shows that such an approach outperforms all other
icant gains as the amount of labeled semantic similarity data
textual-entailment systems except for the llinois-LH system
grows, both for statistical reasons and because sufficiently
of lai and Hockenmaier(2014). Thus even though the fea- large sample sizes enable learning of de novo word-vectors
tures provided to the svm are learned for the distinct goal of
tailored to this model
2791
References
Learning and Explicit Matrix Factorization Perspective. 1J
Abzianidze, L. 2015. A Tableau Prover for Natural Logic
CAI
and Language. EMNLP 2492-2502
Lien, E, and Kouylekov, M. 2015. Semantic Parsing for
Agirre,E. and Cer. D. 2013. SEM 2013 shared task: Se- Textual Entailment. International Conference on Parsing
mantic Textual Similarity. SemEval 2013
Technologies 40-49
Bengio. Y. 2012. Deep learning of representations for Un
Marelli. M: Bentivogli. L: Baroni.M: Bernardi. R
supervised and Transfer Learning. JMLR W&CP: Proc. Un
Menini, S. and Zamparelli, R. 2014. SemEval-2014 Task 1
supervised and Transfer Learning challenge and workshop
Evaluation of compositional distributional semantic models
17-36
on full sentences through semantic relatedness and textual
Bjerva, J. Bos, J. van der Goot, R; and Nissim, M. 2014
entailment. Semeval 20/4
The meaning factory: Formal semantics for recognizing tex
Mihalcea R ; Corley, C, and Strapparava, C. 2006. Corpus
tual entailment and determining semantic similarity. Se-
based and Knowledge-based measures of Text Semantic
mEval 2014
Similarity. AAAI Conference on Artificial intelligence
Bowman, S.R.; Angeli, G; Potts, C; and Manning, C. D
Mikolov. t: Sutskever. I Chen K. Corrado. G. and dean
2015. A large annotated corpus for learning natural language J. 2013. Distributed Representations of Words and Phrases
nference EMNLP 632-642
and their compositionality. NIPS 3111-3119
Chen, K, and Salman, A 2011. Extracting Speaker-Specific
Miller. G.A. 1995. WordNet: A Lexical Database for En
Information with a Regularized Siamese Deep Network
glish. Communications of the ACM 38(11): 39-41
NPS298-306
Pascanu, R, Mikolov, T, and Bengio, Y. 2013. On the
Cho. K: Gulcehre B. v M.C. Bahdanau D: Schwenk, F.
difficulty of training recurrent neural networks. ICML 1310
B H, and Bengio. Y. 2014. Learning Phrase Representa
1318.
tions using rnn encoder-Decoder for statistical machine
Siegelmann, H. T, and Sontag, E D. 1995. On the Com
Translation EMNLP 1724-1734
putational Power of Neural Nets. Journal of computer and
Chopra. S: Hadsell. R: and Le cun, Y. 2005. Learn
Svstem sciences 50: 132-150
ing a similarity metric discriminatively, with application to Socher, R 2014. Recursive Deep learning for Natural lan
face verification. Computer Vision and Pattern Recognition guage Processing and Computer Vision. Phd thesis, Stan
1:539-546
ford University
Fan, J, and Gijbels, I. 1992. Variable bandwidth and lo
Sutskever, I Vinyals, O. and Le, Q. 2014. Sequence to
cal linear regression smoothers. The annals of statistics
sequence learning with neural networks. NIPS3104-3112
20:2008-2036
Tai, K. S, Socher, R. and Manning, C. D. 2015
Graves, A. 2012. Supervised sequence labelling with Re-
Improved semantic Representations From Tree-Structured
current Neural Networks. Studies in Computational Intelli
Long short-Term Memory Networks. ACL 1556-1566
gence, Springer
van der maaten. L. and hinton G. 2008. Visualizing high-
Greff. K. Srivastava.R.K. Koutnik.L. Steunebrink.B. R
Dimensional data Using t-SNE. Journal of machine learn
and Schmidhuber, J 2015. LSTM: A Search Space Odyssey. ing Research 9: 2579-2605
arXiv:1503.04069
Yih. W: Toutanova K. Platt. J. and meek. C. 2011. Learn-
He, H. Gimpel, K, and Lin, J. 2015. Multi-Perspective Sen
ing Discriminative Projections for Text Similarity Measures
tence Similarity Modeling with Convolutional Neural Net-
Proceedings of the Fifteenth Conference on Computational
works. EMNLP 1576-1586
Natural Language Learning 247-256
hochreiter, S, and Schmidhuber, J. 1997. Long Short-Term
Zeiler, M. D. 2012. ADADELTA: An Adaptive Learning
Memory. Neural Compulation 9(8): 1735-1780
Rate method arXiv 1212.5707
Jimenez, S; Duenas, G; Baquero, J. Gelbukh, A Batiz, A
Zhang, X, Zhao, J, and LeCun, Y. 2015. Character
J D. and Mendizabal, A. 2014. Unal-nlp: Combining soft
level Convolutional Networks for Text Classification
cardinality features for semantic textual similarity, related
arXiv:1509.01626.
ness and entailment SemEval 2014
Zhao J . Zhu.T. T. and Lan. M. 2014. Ecnu: One stone
Kiros, R; Zhu, Y, Salakhutdinov, R; Zemel, R S, Torralba,
two birds: Ensemble of heterogenous measures for semantic
A Urtasun, R; and Fidler, S. 2015. Skip-Thought Vectors
relatedness and textual entailment. SemEval 2014
NIPS to appear
Lai. a. and hockenmaier. j. 2014. lllinois-h a deno-
tational and distributional approach to semantics. SemEval
2014.
Le, Q, and Mikolov, T. 2014. Distributed representations
of sentences and documents CML 1188-1196
Li,Y∴;ⅹu,L.;Tian,F.; Jiang,L.; Zhong,ⅹ.; and chen.E.
2015. Word Embedding revisited: A New Representation
2792
(系统自动生成,下载前可以参看下载内容)
下载文件列表
相关说明
- 本站资源为会员上传分享交流与学习,如有侵犯您的权益,请联系我们删除.
- 本站是交换下载平台,提供交流渠道,下载内容来自于网络,除下载问题外,其它问题请自行百度。
- 本站已设置防盗链,请勿用迅雷、QQ旋风等多线程下载软件下载资源,下载后用WinRAR最新版进行解压.
- 如果您发现内容无法下载,请稍后再次尝试;或者到消费记录里找到下载记录反馈给我们.
- 下载后发现下载的内容跟说明不相乎,请到消费记录里找到下载记录反馈给我们,经确认后退回积分.
- 如下载前有疑问,可以通过点击"提供者"的名字,查看对方的联系方式,联系对方咨询.