Split (1)

train · 200 rows

paper_id stringlengths 10 19	venue stringclasses 15 values	focused_review stringlengths 7 9.4k	point stringlengths 49 654
ARR_2022_266_review	ARR_2022	1. One of the main drawbacks of this approach is that presumably the different component black-box experts of the controlled text generation have to be manually selected and the weighted linear combination has to be fine-tuned for each task. It is also not discussed if the inference time is significantly affected by th...	- BertScore and BLEURT are inconsistently typeset through the paper (alternatively as Bertscore or Bleurt). It would be better to maintain consistency.
ARR_2022_7_review	ARR_2022	1. The selling point of this paper is unsupervised pretrained dense retriever(LaPraDoR) can per- form on par with supervised dense retriever, but actually, LaPraDoR is a hybrid retriever rather than a pure dense retriever. In a way, it’s unfair to compare hybrid method to dense/sparse method as shown in table 1, becaus...	4. It sounds unreasonable that increasing the model size can hurt the performance, as recent paper Ni et al. shows that the scaling law is also apply to dense retrieval model, so the preliminary experimental results on Wikipedia about model size should be provided in detail.
NIPS_2020_7	NIPS_2020	- The transfer scenarios in Sec 3 are confusing, which in turn makes Figs 1&2 confusing. It seems like lines 108-110 state that VGG is always used as the whitebox source model and the WRN/RNXT/DN are always used as the victim blackbox target models. However, lines 128 - 130 contradict this talking about when WRN/RNXT/D...	- Some aspects in the presentation quality of this paper are a weakness for a high quality publication (e.g. NeurIPS). For example, Figs 1&2 as discussed before, the tables with a "-" for the method, the "Dataset" columns in the tables are not informative, the management of Fig 3 and Table 2, a "*" appearing in Table 1...
ARR_2022_253_review	ARR_2022	- The paper uses much analysis to justify that the information axis is a good tool to be applied. As pointed out in conclusion, I'm curious to see some related experiments that this information axis tool can help with. - For Figure 1, I have another angle for explaining why randomly-generated n-grams are far away from ...	- The paper uses much analysis to justify that the information axis is a good tool to be applied. As pointed out in conclusion, I'm curious to see some related experiments that this information axis tool can help with.
ICLR_2022_2531	ICLR_2022	I have several concerns about the clinical utility of this task as well as the evaluation approach. - First of all, I think clarification is needed to describe the utility of the task setup. Why is the task framed as generation of the ECG report rather than framing the task as multi-label classification or slot-filling...	- I’d be interested to know if other multilingual pretraining setups also struggle with Greek.
dvDi1Oc2y7	EMNLP_2023	1) There are potentially numerous baselines as data augmentation for hard examples has several work. Given the closeness with this proposed work of using paraphrases (both negative and positive), some of baselines are necessary for comparison with GBT, especially counterfactual data-augmentation techniques as GBT uses ...	5) The text in line 293-295 makes the above point a little bit more unclear. It would be difficult for readers to understand and evaluate – “we manually observed the generated examples and find the results acceptable.”
NIPS_2020_367	NIPS_2020	- Below eq (3), for the upper bound of $\delta_t$ the right-hand side should be $2\sum_s\eta_sa_s$ instead of $2\sum_s\eta_sa_s\delta_s$. - It is misleading to claim that it is the first work to address the stability of SGD for non-smooth convex loss functions as there are indeed existing work which already addressed s...	2. Private Stochastic Convex Optimization: Efficient Algorithms for Non-smooth Objectives, Arxiv preprint (2020). In this Arxiv preprint, the authors developed a different differentially private algorithm (Private FTRL) for non-smooth learning problems which can also achieve optimal generalization bounds.
lYongcxaNz	ICLR_2025	The weaknesses of this paper are summarized as follows: * The presentation of this paper needs improvements. In particular, in Theorem 1, the role of $gamma$ is not clear to me. Why does one need to involve $\gamma$ in this theorem, is there any condition for $\gamma$ (may be the same condition in Lemma 1)? * In Theore...	* The proof is extremely not well organized. Many proofs do not have clean logic and are very hard to follow, thus making it hard to rigorously check the correctness of the proof. For instance, in Lemma 3, does the result hold for any polynomial function $P(\gamma)$?
ICLR_2023_700	ICLR_2023	1. Some intuitions can be further explained, e.g., in section 2.2 the situation that breaks the factorized distribution can have a factorized support. It will be more convincing to give an example which does not have a factorized support will fail to disentangle, more intuitively show the relationship between factorize...	2. Since this paper claims to aim at the realistic scenario of disentangled representation learning, it is better to conduct experiments on real world datasets instead of the synthetic datasets(at least for the out-of-distribution setting.).
NIPS_2017_35	NIPS_2017	- The applicability of the methods to real world problems is rather limited as strong assumptions are made about the availability of camera parameters (extrinsics and intrinsics are known) and object segmentation. - The numerical evaluation is not fully convincing as the method is only evaluated on synthetic data. The ...	- Some explanations are a little vague. For example, the last paragraph of Section 3 (lines 207-210) on the single image case. Questions/comments:
ICLR_2022_3099	ICLR_2022	W1: The setting seems to be limited and not well justified. 1) It only consider ONE truck and ONE drone. Would it be easy to extend to multiple trucks and drones? This seems to be a more interesting and practical setting. 2) What is the difference of this setting versus settings where there are multiple trucks? Are the...	1) It only consider ONE truck and ONE drone. Would it be easy to extend to multiple trucks and drones? This seems to be a more interesting and practical setting.
ARR_2022_162_review	ARR_2022	1. The proposed approach to pretraining has limited novelty since it more or less just follows the strategies used in ELECTRA. 2. It is not clear whether baselines participating in the comparison are built on the same datasets that are used to build XLM-E. 1. From the results in Table 1, we can see that XLM-E lags behi...	1. The proposed approach to pretraining has limited novelty since it more or less just follows the strategies used in ELECTRA.
NIPS_2020_528	NIPS_2020	I would largely consider most of the weaknesses to be issues with motivation and presentation rather than with the technical content of the results. 1) The motivation/need for the Newton algorithm in section 4 was somewhat lacking I felt. This is essentially just a 1-dimensional line search on a convex function, so eve...	1) The motivation/need for the Newton algorithm in section 4 was somewhat lacking I felt. This is essentially just a 1-dimensional line search on a convex function, so even something as basic as a bisecting line search will converge linearly. While of course quadratic convergence is better than linear convergence, how ...
38k1q1yyCe	EMNLP_2023	- Regarding the synthetic experiment: It is impossible to tell to what extent the findings from the artificial language translation experiment generalise to natural data, where non-compositional translations are much more complex. To name 3 reasons: 1) idioms have various conventionalities (~ratio between idiomatic vs ...	- Regarding the proposed upweighing and KNN methods: For the majority of language and score combinations (see Figure 3), the impact that the methods have on idiomatic vs random data is similar; hence the proposed MT modelling methods seem far from idiom-specific. Therefore, the results simply appear to indicate that "b...
NIPS_2018_15	NIPS_2018	weakness of this paper is its lack of clarity and aspects of the experimental evaluation. The ResNet baseline seems to be just as good, with no signs of overfitting. The complexity added to the hGRU model is not well motivated and better baselines could be chosen. What follows is a list 10 specific details that we woul...	77) then we believe the resulting volume should be WxHx1 and the bias is a scalar. The authors most certainly want to have several kernels and therefore several biases but we only found this hyper-parameter for the feed forward models that are described in section 3.4. The fact that they have C biases is confusing.
HM2E7fnw2U	ICLR_2024	- I am worried about how to ensure that s contains only static features. The authors claim that static factors can be extracted from a single frame in the sequence, which is not a necessary and sufficient condition. Otherwise, any frame from the video can be used. Why the first frame? - In addition, in Equation 8, if s...	- In addition, in Equation 8, if s contains dynamic factors, subtracting s from the dynamic information may result in the loss of some dynamic information, making it difficult for the LSTM module to capture the complete dynamic changes.
NIPS_2021_2131	NIPS_2021	- There is not much technical novelty. Given the distinct GPs modeling the function network, the acquisition function and sampling procedure are not novel - The theoretical guarantee is pretty weak (random search is asymptotically optimal). The discussion of not requiring dense coverage to prove the method is asymptoti...	- How does the number of MC samples affect performance, empirically? How does the network structure affect this?
NIPS_2020_832	NIPS_2020	The reviewer has some major concerns about the experiments. 1. The paper combines many objectives (about nine loss terms in Eq. 5, Eq. 8, and Eq. 12) to optimize the reconstruction network, but has not studied these losses in the experiments section. Such a complex loss function may weaken the contribution of the data ...	4. The reviewer suggests showing the smoothed GT shapes in Figure. 3 and Figure. 5 so that the readers can better understand the quality of the reconstruction. A minor concern:
TjfXcDgvzk	ICLR_2024	1. The technical novelty is relatively minor with the overall idea being a combination of prior works PRANC and NOLA. While this seems enough to provide empirical improvement, the approach itself is not that big of an innovation over prior works. 2. While the prior approach PRANC is directly modified by the authors in ...	2. While the prior approach PRANC is directly modified by the authors in this work there are no direct comparisons with it in either the language or vision tasks used to evaluate the proposed approach. There is a comparison of training loss in Section 3.4 and a comparison of the rank of possible solutions of the two ap...
ACL_2017_350_review	ACL_2017	Not much novelty in method. Not quite clear if data set is general enough for other domains. - General Discussion: This paper describes a rule-based method for generating additional weakly labeled data for event extraction. The method has three main stages. First, it uses Freebase to find important slot fillers for mat...	- I'm also concern with the generalizability of this method to other domains. Section 2 line 262 says that 21 event types are selected from Freebase. How are they selected? What is the coverage on the 33 event types in the ACE data.
NIPS_2020_373	NIPS_2020	- The submission would benefit from clarifying assumptions as early as possible to help categorise this work in the array of possible solutions to a practical CL problem. Specifically, as presented this is a competitive solution provided: 1. The use of memory is possible in an application of interest 2. Clear task boun...	1. The use of memory is possible in an application of interest 2. Clear task boundaries exist and can be identified or are provided.
NIPS_2021_1743	NIPS_2021	1. While the paper claim the importance of language modeling capability of pre-trained models, the authors did not conduct experments on generation tasks that are more likely to require a well-performing language model. Experiments on word similarity and SquAD in section 5.3 cannot really reflect the capability of lang...	1. While the paper claim the importance of language modeling capability of pre-trained models, the authors did not conduct experments on generation tasks that are more likely to require a well-performing language model. Experiments on word similarity and SquAD in section 5.3 cannot really reflect the capability of lang...
Bwhd7GUyHH	ICLR_2025	I am currently holding many confusions over the setting of this work, and thus not readily at a stage to judge this work. I will take a deeper look into the technical contributions after I found myself understood the basics. Major questions: 1. The reward defined in Eqn. (1) is weird to me in the sense that as an expec...	9. The notations of $\hat{Y}$ and $Y$ are used in a mixed way in Section 2.
NIPS_2020_389	NIPS_2020	1. This paper lacks some very important references for domain adaptation. The authors should cite and discuss in the revised manuscript. - Li et al. Bidirectional Learning for Domain Adaptation of Semantic Segmentation. In CVPR, 2019. https://arxiv.org/pdf/1904.10620.pdf - Chen et al. CrDoCo: Pixel-level Domain Transfe...	1. This paper lacks some very important references for domain adaptation. The authors should cite and discuss in the revised manuscript.
NIPS_2017_337	NIPS_2017	of the manuscript stem from the restrictive---but acceptable---assumptions made throughout the analysis in order to make it tractable. The most important one is that the analysis considers the impact of data poisoning on the training loss in lieu of the test loss. This simplification is clearly acknowledged in the writ...	- Figure 2 and 3 are hard to read on paper when printed in black-and-white.
NIPS_2017_114	NIPS_2017	Weakness- - Comparison to other semi-supervised approaches : Other approaches such as variants of Ladder networks would be relevant models to compare to. Questions/Comments- - In Table 3, what is the difference between \Pi and \Pi (ours) ? - In Table 3, is EMA-weighting used for other baseline models ("Supervised", \Pi...	- In Table 3, is EMA-weighting used for other baseline models ("Supervised", \Pi, etc) ? To ensure a fair comparison, it would be good to know that all the models being compared to make use of the EMA benefits.
NIPS_2018_630	NIPS_2018	- While there is not much related work, I am wondering whether more experimental comparisons would be appropriate, e.g. with min-max networks, or Dugas et al., at least on some dataset where such models can express the desired constraints. - The technical delta from monotonic models (existing) to monotonic and convex/c...	- The SCNN getting "lucky" on domain pricing is suspicious given your hyperparameter tuning. Are the chosen hyperparameters ever at the end of the searched range? The distance to the next best model is suspiciously large there. Presentation suggestions:
ACL_2017_792_review	ACL_2017	1. Unfortunately, the results are rather inconsistent and one is not left entirely convinced that the proposed models are better than the alternatives, especially given the added complexity. Negative results are fine, but there is insufficient analysis to learn from them. Moreover, no results are reported on the word a...	2. Some aspects of the experimental setup were unclear or poorly motivated, for instance w.r.t. to corpora and datasets (see details below).
NIPS_2019_564	NIPS_2019	Weakness: 1. The improvement of the proposed method over existing RL method is not impressive. 2. Compared to OR tools and RL baselines, the time and computational cost should be reported in detail to fairly compare different methods. Comment after feedback: The authors have addressed the concerns of running time. Sinc...	1. The improvement of the proposed method over existing RL method is not impressive.
NIPS_2017_217	NIPS_2017	- The model seems to really require the final refinement step to achieve state-of-the-art performance. - How does the size of the model (in terms of depth or number of parameters) compare to competing approaches? The authors mention that the model consists of 4 hourglass modules, but do not say how big each hourglass m...	- How does the size of the model (in terms of depth or number of parameters) compare to competing approaches? The authors mention that the model consists of 4 hourglass modules, but do not say how big each hourglass module is.
NIPS_2016_395	NIPS_2016	- I found the application to differential privacy unconvincing (see comments below) - Experimental validation was a bit light and felt preliminary RECOMMENDATION: I think this paper should be accepted into the NIPS program on the basis of the online algorithm and analysis. However, I think the application to differenti...	1) Section 1.2: the dimensions of the projection matrices are written as $A_i \in \mathbb{R}^{m_i \times d_i}$. I think this should be $A_i \in \mathbb{R}^{d_i \times m_i}$, otherwise you cannot project a tensor $T \in \mathbb{R}^{d_1 \times d_2 \times \ldots d_p}$ on those matrices. But maybe I am wrong about this...
7fuddaTrSu	ICLR_2025	1. Grammatical errors and careless statements plague the manuscript. It should be carefully proofread. I'm including some of the grammar mistakes/typos at the end of the "Weaknesses" section. Here are some examples for the careless statements, just from the introduction and related work: - "The past decade has seen sup...	- The claim that "To address this gap, we propose PACE, which treats climate emulation as a diagnostic-type prediction" is misleading without making clear that prior work (e.g. ClimateBench or ClimateSet) does exactly this.
ICLR_2022_2163	ICLR_2022	Weakness: 1. This paper only uses metric embedding to tell a story for DNN models and does not provide the specific relationship between metric learning and DNNs. For example, whether the feature transformation obtained by DNN meets the definition of metric (or part of the definition), and whether the perspective of me...	2. The metric learning theory in this paper basically comes from the generalization theory of neural networks [Bartlett et al. (2017)]. Compared with the previous theoretical results, the metric perspective analysis proposed in this paper does not give better results. From the existing content of this paper, the part o...
NIPS_2017_370	NIPS_2017	- There is almost no discussion or analysis on the 'filter manifold network' (FMN) which forms the main part of the technique. Did authors experiment with any other architectures for FMN? How does the adaptive convolutions scale with the number of filter parameters? It seems that in all the experiments, the number of i...	- It would be good to move some visual results from supplementary to the main paper. In the main paper, there is almost no visual results on crowd density estimation which forms the main experiment of the paper. At present, there are 3 different figures for illustrating the proposed network architecture. Probably, auth...
NIPS_2022_183	NIPS_2022	The writing can be improved as it causes difficulty even for experienced readers. Examples include but not limit to 1) Last column in Table 1 should refer to Theorem 7 rather than Theorem 6; 2) Using r to denote the risk for minimization problems and primal risk for minimax problem at the same time is confusing; 3) Ove...	2) Using r to denote the risk for minimization problems and primal risk for minimax problem at the same time is confusing;
NIPS_2021_311	NIPS_2021	- The paper leaves some natural questions open (see questions below). - Line 170 mentions that the corpus residual can be used to detect an unsuitable corpus, but there are no experiments to support this. After authors' response All the weakness points have been addressed by the authors' response. Consequently I have r...	- What if we don’t know that a test example is crucially different, e.g. what if we don’t know that the patient of Figure 8 is “British” and we use the American corpus to explain it? Can this be detected with the corpus residual value?
ACL_2017_606_review	ACL_2017	- [Choice of Dataset] The authors use WebQuestionsSP as the testbed. Why not using the most popular WebQuestions (Berant et al., 2013) benchmark set? Since NSM only requires weak supervision, using WebQuestions would be more intuitive and straightforward, plus it could facilitate direct comparison with main-stream QA r...	- [Choice of Dataset] The authors use WebQuestionsSP as the testbed. Why not using the most popular WebQuestions (Berant et al., 2013) benchmark set? Since NSM only requires weak supervision, using WebQuestions would be more intuitive and straightforward, plus it could facilitate direct comparison with main-stream QA r...
NIPS_2020_1824	NIPS_2020	- The two settings considered, the fixed design and the low-smoothness setting are both fairly restricted. In particular, requiring that the smoothness parameter beta < 1 is rather strong, as indicated by the example/discussion given in Section 4. - The machinery used for analysis, e.g., kernel-methods and differencing...	- The machinery used for analysis, e.g., kernel-methods and differencing are known and used often in nonparametric estimation. Nevertheless, the application yields interesting results here.
TskzCtpMEO	ICLR_2024	1. the experiments are quite bare-bones for a BNN paper, there is no evaluation of predictive uncertainty besides calibration -- we don't need a Bayesian approach do well on this metric. I would either suggest adding e.g. a temperature scaling baseline applied to a sparse deterministic net or (preferably) the usual out...	5. I don't really see the need to make such claims in the first place, it is not obvious that sparsity in training is desirable. Of course it may be the case that a larger network that would not fit into memory without sparsity performs better, but then this needs to be demonstrated (or like-wise any hypothetical train...
NIPS_2021_1860	NIPS_2021	Please refer to Main Review for the detailed comments. 1 Novelty is limited. The design is not quite new, based on the fact that attention for motion learning has been widely used in video understanding. 2 By the way, temporal shift module [TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019] is a po...	1 Novelty is limited. The design is not quite new, based on the fact that attention for motion learning has been widely used in video understanding.
ICLR_2023_1400	ICLR_2023	- While the paper shows improvements on CIFAR derivatives, it lacks analysis or results on other datasets (e.g., ImageNet derivatives). Verifying the effectiveness of the framework on ImageNet-1k or even ImageNet-100 is important. These results ideally can be presented in the main paper. - The authors should add some d...	- While the paper shows improvements on CIFAR derivatives, it lacks analysis or results on other datasets (e.g., ImageNet derivatives). Verifying the effectiveness of the framework on ImageNet-1k or even ImageNet-100 is important. These results ideally can be presented in the main paper.
NIPS_2022_1523	NIPS_2022	Weakness: 1 Causality: I think the main drawback of this manuscript is the discussion of causality. In line 25, the authors claim that causality has been mathematically defined by Wiener et.al.. it would be nice to explicitly give the definition here, as reviewers may not familiar with this definition. Importantly, the...	2 Unclear model design: The model architecture and learning details are fragmented or missing. The authors could either provide a plot of model illustration, pseudo-code table, or code repository. Considering that Neurochaos Learning is not a well-known method, it is important to demonstrate integrated details to facil...
orefzVRWqV	EMNLP_2023	I only have these concerns from the paper: 1. BigFive and MBTI are stated as models to be extended in Abstract and Introduction sections while they are used as mere datasets in Experiments. It's better to just state them as datasets throughout the paper unless the authors should provide an extended explanation why they...	1. BigFive and MBTI are stated as models to be extended in Abstract and Introduction sections while they are used as mere datasets in Experiments. It's better to just state them as datasets throughout the paper unless the authors should provide an extended explanation why they are addressing them like that.
rGvDRT4Z60	ICLR_2024	- The implications of rejecting for fairness are not considered. Rejection for privacy has implications in terms of privacy budget and likewise rejections for fairness come with implications and ignoring them might be responsible for the observed gains on the Pareto frontier. Consider the noted rejection example: "If a...	- Rejection rate is not shown in any experiments. One could view a misclassification as a rejection, however. Please include rejection rates or view them as misclassifications in the results.
NIPS_2022_1666	NIPS_2022	I cannot give a clear acceptance to the current manuscript due to the following concerns: 1. Inaccurate Contribution: One claimed contribution of this work is the compact continuous parameterization of the solution space. However, as discussed in the paper, DIMES directly uses the widely-used GNN models to generate the...	2) generalization to the specific TSP instances (the fine-tuning step in DIMES). I do see it is DIMES's own advantages (direct RL training for large-scale problems + meta fine-tuning) to overcome these two generalization gaps, but the difference should be clearly clarified in the paper. In addition, it is also interest...
ARR_2022_98_review	ARR_2022	1. Human evaluations were not performed. Given the weaknesses of SARI (Vásquez-Rodríguez et al. 2021) and FKGL (Tanprasert and Kauchak, 2021), the lack of human evaluations severely limits the potential impact of the results, combined with the variability in the results on different datasets. 2. While the authors expla...	4. What were the final thresholds that were used for the results? It will also be good for reproducibility if the authors can share the full set of hyperparameters as well.
ACL_2017_148_review	ACL_2017	- The goal of your paper is not entirely clear. I had to read the paper 4 times and I still do not understand what you are talking about! - The article is highly ambiguous what it talks about - machine comprehension or text readability for humans - you miss important work in the readability field - Section 2.2. has com...	- You say that your “dataset analysis suggested that the readability of RC datasets does not directly affect the question difficulty”, but this depends on the method/features used for answer detection, e.g. if you use POS/dependency parse features.
NIPS_2018_66	NIPS_2018	of their proposed method for disentangling discrete features in different datasets. I think that the main of the paper lies in the relatively thorough experimentation. I thought the results in Figure 6 were particularly interesting in that they suggest that there is an ordering in features in terms of mutual informatio...	- Figure 1 could be optimized to use less whitespace.
ACL_2017_494_review	ACL_2017	- fairly straightforward extension of existing retrofitting work - would be nice to see some additional baselines (e.g. character embeddings) - General Discussion: The paper describes "morph-fitting", a type of retrofitting for vector spaces that focuses specifically on incorporating morphological constraints into the ...	3) Ideally, we would have a vector space where morphological variants are just close together, but where we can assign specific semantics to the different inflections. Do you have any evidence that the geometry of the space you end with is meaningful. E.g. does "looking" - "look" + "walk" = "walking"? It would be nice ...
NIPS_2016_93	NIPS_2016	- The claims made in the introduction are far from what has been achieved by the tasks and the models. The authors call this task language learning, but evaluate on question answering. I recommend the authors tone-down the intro and not call this language learning. It is rather a feedback driven QA in the form of a dia...	- Overall, the writing quality of the paper should be improved; e.g., the authors spend the same space on explaining basic memory networks and then the forward model. The related work has missing pieces on more reinforcement learning tasks in the literature.
NIPS_2017_303	NIPS_2017	of their approach with respect to the previous SUCRL. The provided numerical simulation is not conclusive but supports the above considerations; - Clarity: the paper could be clearer but is sufficiently clear. The authors provide an example and a theoretical discussion which help understanding the mathematical framewor...	- Line 140: here the first column of Qo is replaced by vo to form P'o, so that the first state is not reachable anymore but from a terminating state. I assume that either Ass.1 (finite length of an option) or Ass.
ICLR_2022_3188	ICLR_2022	One major concern is that using recurrent networks may increase computation complexity. Authors should include FLOPs and inference time in all tables. Computation is a very important factor in networks - one can easily have a much stronger network with fewer #parameters but more computation. On the other hand, having F...	1) - if authors did not find improvement in FLOPs or inference time, I suggest looking at if there is any improvement on the accuracy or specific properties. For example, with the recurrent model, maybe the sequential relationship is easier to mode?
ACL_2017_588_review	ACL_2017	and the evaluation leaves some questions unanswered. - Strengths: The proposed task requires encoding external knowledge, and the associated dataset may serve as a good benchmark for evaluating hybrid NLU systems. - Weaknesses: 1) All the models evaluated, except the best performing model (HIERENC), do not have access ...	1) An important assumption being made is that d_e are good replacements for entity embeddings. Was this assumption tested?
ARR_2022_98_review	ARR_2022	1. Human evaluations were not performed. Given the weaknesses of SARI (Vásquez-Rodríguez et al. 2021) and FKGL (Tanprasert and Kauchak, 2021), the lack of human evaluations severely limits the potential impact of the results, combined with the variability in the results on different datasets. 2. While the authors expla...	3. ( minor) It is unclear how the authors arrived at the different components of the "scoring function," nor is it clear how they arrived at the different threshold values/ranges.
NIPS_2022_51	NIPS_2022	Weakness My major concern for this paper is that the empirical contribution is over-claimed. However, Section 5.1 is the place I think the authors measure their work in a correct way but the corresponding results are neither significantly better nor comprehensive enough to support their claimed contribution. I will ela...	2) making the factors in a table does not help convey more messages than pure text. There is no more information at all.
NIPS_2017_434	NIPS_2017	--- This paper is very clean, so I mainly have nits to pick and suggestions for material that would be interesting to see. In roughly decreasing order of importance: 1. A seemingly important novel feature of the model is the use of multiple INs at different speeds in the dynamics predictor. This design choice is not ab...	* How many different kinds of physical interaction can be in one simulation?
NIPS_2021_894	NIPS_2021	Unfortunately, the weaknesses may outweigh the strengths for this submission. The paper is uncomfortably split in two between (1) the comparison of established models and (2) the introduction of the FT-Transformer. I believe that both parts would benefit from additional work. 1. Model Comparison For a paper which lists...	1. Model Comparison For a paper which lists the 'thorough' comparison of models on a 'wide range' of datasets as a key contribution, the chosen selection of datasets is not adequate for a variety of reasons: Only one of the datasets has categorical features. All other datasets have exclusively numerical features. Categ...
NIPS_2020_232	NIPS_2020	Currently I am giving a score 8, mainly because the idea, motivation and storyline are exciting. But the draft’s Sections 3 & 4 remain unclear in several ways. My final score will depend on how the authors clarify the main questions below: -Section 3 appears to be too “high level” (it shouldn’t be, for the many new thi...	- Section 4: The two IoT datasets (FlatCam Face [26], Head-pose detection [11]) are unpopular, weird choices. The former is relatively recent but not substantially followed yet. The latter was published in 2004 and was no longer used much recently. I feel strange why the authors choose the two uncommon datasets, that m...
ICLR_2021_1527	ICLR_2021	weakness of the paper: I am not convinced, that the efficiency of RL self-play is best measured per agent. In the appendix, it is rightfully argued, that part of the training could be parallelized. However, the conclusion that the baseline experiments thus could be repeated N times, seems to ignore that the additional ...	3 out of 5 CoE: I don’t see the paper in violation of the ICLR’s Code of Ethics.
NIPS_2019_573	NIPS_2019	of the paper: - no theoretical guarantees for convergence/pruning - though experiments on the small networks (LeNet300 and LeNet5) are very promising: similar to DNS [16] on LeNet300, significantly better than DNS [16] on LeNet5, the ultimate goal of pruning is to reduce the compute needed for large networks. - on the ...	4) Pruning majorly works with large networks, which are usually trained in distributed settings, authors do not mention anything about potential necessity to find global top Q values of the metric over the average of gradients. This will potentially break big portion of acceleration techniques, such as quantization and...
NIPS_2016_478	NIPS_2016	weakness is in the evaluation. The datasets used are very simple (whether artificial or real). Furthermore, there is no particularly convincing direct demonstration on real data (e.g. MNIST digits) that the network is actually robust to gain variation. Figure 3 shows that performance is worse without IP, but this is no...	- Have some of the subfigures in Figs 1 and 2 been swapped by mistake?
ARR_2022_340_review	ARR_2022	My main concern is that they haven't quite demonstrated enough to validate the claim that these are demonstrating a causal role for syntactic knowledge. Two crticisms in particular: 1) The dropout probe improves sensitivity. It finds a causal role for syntactic representations where previous approaches would have misse...	1) The dropout probe improves sensitivity. It finds a causal role for syntactic representations where previous approaches would have missed it. Good. But all other things being equal, one should worry that this also increases the risk of false positives. I would think this should be a substantial part of the discussion...
NIPS_2020_1316	NIPS_2020	My concerns are as follows. 1. The regret in [1] is defined on function value while this work defines it with the norm of gradient. It is better to provide the same measurement for a fair comparison. 2. The topic that reduces variance with importance sampling is not new. Besides vanilla SGD, more baselines with varianc...	4. Authors claim that the regret bound for the proposed mini-batch method is cast to appendix. However, I didn’t find the regret bound for the mini-batch estimator in the supplementary. [1] Zalan Borsos, Andreas Krause, and Kfir Y Levy. Online Variance Reduction for Stochastic Optimization.
NIPS_2019_1246	NIPS_2019	- The formatting of the paper seems to be off. - The paper could benefit from reorganization of the experimental section; e. g. introducing NLP experiments in the main body of the paper. - Since the paper is proposing a new interpretation of mixup training, it could benefit by extending the comparisons in Figure 2 by i...	* Paper formatting seems to be off - It does not follow the NeurIPS formatting style. The abstract font is too large and the bottom page margins seem to be altered. By fixing the paper style the authors should gain some space and the NLP experiments could be included in the main body of the paper.
ICLR_2023_2869	ICLR_2023	Weakness: 1.The technical quality of this paper is not enough, and it seems like a direct combination with Evidential Theory and Reinforcement Learning. 2.The paper is not sound as there are many exploration methods in RL literature, such as count-based methods and intrinsic motivations(RND,ICM). But the paper does not...	2.The paper is not sound as there are many exploration methods in RL literature, such as count-based methods and intrinsic motivations(RND,ICM). But the paper does not discuss and compare these methods.
ICLR_2021_977	ICLR_2021	Weakness: Motivations behind its technical contributions can be further sharpened; comparisons to previous related studies on the inductive graph learning domain can be further improved Some gaps between the current experiment setup and real-world recommendation senarios ################################################...	- Annotations in Figure 4 can be further enlarged for visibility
ARR_2022_237_review	ARR_2022	of the paper include: - The introduction of relation embeddings for relation extraction is not new, for example look at all Knowledge graph completion approaches that explicitly model relation embeddings or works on distantly supervised relation extraction. However, an interesting experiment would be to show the impact...	- Lines 26-27: Multiple entities typically exist in both sentences and documents and this is the case even for relation classification, not only document-level RE or joint entity and relation extraction.
ICLR_2021_973	ICLR_2021	. Clearly state your recommendation (accept or reject) with one or two key reasons for this choice. I recommend acceptance. The number of updates needed to learn realistic brain-like representations is a fair criticism of current models, and this paper demonstrates that this number can be greatly reduced, with moderate...	- Fig.4: On the color bar, presumably one of the labels should say “worse”.
NIPS_2022_2635	NIPS_2022	Weakness: The writing of this paper is roughly good but could be further improved. For example, there are a few typos and mistakes in grammar: 1. Row 236 in Page 4, “…show its superiority.”: I think this sentence should be polished. 2. Row 495 in Supp. Page 15: “Hard” should be “hard”. 3. Row 757 in Supp. Page 29: “…tr...	3. Row 757 in Supp. Page 29: “…training/validation/test” should be “…training/validation/test sets”.
ARR_2022_201_review	ARR_2022	I’m not convinced that AFiRe (the adversarial regularization) brings significant improvement, especially because - BLEU improvements are small (e.g., 27.93->28.64; would humans be able to identify the differences?) - Hyperparameter details are missing. - Human evaluation protocols, payment, etc. are all missing. Who ar...	- Does it mean that inference gets slowed down drastically, and there’s no way to only do inference (i.e., predict the label)? I don’t think this is fatal though. What’s the coefficient of the p(L, E \| X) term in line 307? Why is it 1? Hyperparamter details are missing, so it’s not clear whether baselines are well-tune...
NIPS_2021_40	NIPS_2021	/Questions: I only have minor suggestions: 1.) In the discussion, it may be worth including a brief discussion on the empirical motivation for a time-varying Q ^ t and S t , as opposed to a fixed one as in Section 4.2. For example, what is the effect on the volatility of α t and also on the average lengths of the predi...	2.) I found the definition of the quantile a little confusing, an extra pair of brackets around the term ( 1 \| D \| ∑ ( X r , Y r ) ∈ D 1 S ( X r , Y r ) ≤ s ) might help, or maybe defining the bracketed term separately if space allows.
ACL_2017_433_review	ACL_2017	- The annotation quality seems to be rather poor. They performed double annotation of 100 sentences and their inter-annotator agreement is just 75.72% in terms of LAS. This makes it hard to assess how reliable the estimate of the LAS of their model is, and the LAS of their model is in fact slightly higher than the inte...	- Line 152: I think the model by Dozat and Manning (2016) is no longer state-of-the art, so perhaps just replace it with "very high performing model" or something like that.
NIPS_2017_250	NIPS_2017	#ERROR!	2. The proposed compression performs worse than PQ when a small code length is allowed, which is the main weakness of this method, in view of a practical side.
NIPS_2020_1436	NIPS_2020	1. For the principles for designed modules, this paper proposed three basic modules for interior image restoration, however, the interior structure of these modules is fixed, which is not so convinced to build these modules. In my opinion, an inner NAS strategy is necessary to search for a considerable structure for im...	4. Some subjective statements are inappropriate to introduce this paper. Some proofs and references are needed to demonstrate your statement. it is labor-intensive to seek an effective architecture, while the image recovery performance is sensitive to the choice of neural architecture. One more daunting task of multi-s...
NIPS_2022_2797	NIPS_2022	of this paper are 1) Why do sampled subgraphs (segments of the very large graph one wishes to learn) used in feature learning need to be similar in any way to the larger graph, the enormous discrepancy between their node/edge sizes notwithstanding, 2) what actual graph classification tasks did the computational experim...	3) How does the proposed method compare with prior art?
ARR_2022_67_review	ARR_2022	1. Some claims in the paper lack enough groundings. For instance, in lines 246-249, "This difference in the composition of bias types explains why the bias score of BERT is higher in CrowS-Pairs, while the same is higher for SenseBERT in StereoSet." This claim will be justified if the authors can provide the specific b...	2. Some analyses can be more detailed. For example, in "language/nationality", the data includes Japanese, Chinese, English, Arabic, German... (~20 different types). Biases towards different languages/nationalities are different. I was wondering whether there would be some interesting observations comparing them.
ICLR_2023_4236	ICLR_2023	Weakness: 1. Though I may be wrong, I don’t think DefRCN uses FPN. As such, if the author uses DefRCN as baseline, the author should ensure the implementation details for fair comparison. 2. I still cannot fully understand why the norms can be used to represent different features. If IoU is the only reason, one necessa...	4. Besides norm, is there any other property of features can be used? It is necessary and helpful for your approach design.
NIPS_2020_566	NIPS_2020	There are several important points that need to be addressed in the paper. First, there is a non-uniform level of detail and technicality through out the paper. The authors start by trying to be very formal, and specifying that functions come from "separable Banach spaces", but quickly drop this rigor and start being v...	5) Many solvers' algorithms are able to guarantee that some nice mathematical properties are kept. For example, that we do not lose mass, or charge, when solving a physics-related continuous PDEs via methods that are inherently discrete. They use symplectic integrators, etc. How does learning F^\dagger behave in this r...
ICLR_2022_331	ICLR_2022	.) Weaknesses: W2: The method is mostly constructed on top of previous methods; there are no network changes or losses. There is a contribution in the signed distance function and a pipeline for transferable implicit displacement fields. Why are we using two SIRENs for f and d? Shouldn't the d be a simpler network? W3:...	.) Weaknesses:W2: The method is mostly constructed on top of previous methods; there are no network changes or losses. There is a contribution in the signed distance function and a pipeline for transferable implicit displacement fields. Why are we using two SIRENs for f and d? Shouldn't the d be a simpler network?
bt9Ho2FMxd	EMNLP_2023	1) The RQ1 mentioned in the paper seems redundant. This adds no extra information for the audience. It is expected the performance will vary across multiple HS datasets when evaluated in cross-data setting. Another interesting point to analyse would've been how % of explicit hate information in the dataset affects impl...	1) The RQ1 mentioned in the paper seems redundant. This adds no extra information for the audience. It is expected the performance will vary across multiple HS datasets when evaluated in cross-data setting. Another interesting point to analyse would've been how % of explicit hate information in the dataset affects impl...
ICLR_2022_2834	ICLR_2022	are concluded as follows: Strengths: 1. The proposed method is novel using Fourier Transformation to measure the sample uncertainty in the limited supervision. 2. Most of the paper is easy to follow in terms of writing. 3. The experiments are comprehensively designed in three datasets and three tasks. Plus, the paramet...	2. Most of the paper is easy to follow in terms of writing.
ICLR_2022_537	ICLR_2022	1. The stability definition needs better justified, as the left side can be arbitrarily small under some construction of \tilde{g}. A more reasonable treatment is to make it also lower bounded. 2. It is expected to see a variety of tasks beyond link predict where PE is important.	2. It is expected to see a variety of tasks beyond link predict where PE is important.
ICLR_2021_1783	ICLR_2021	1. The main contribution of this paper is introducing adversarial learning process between the generator and the ranker. The innovation of this paper is concerned. 2. Quality of generated images by proposed method is limited. While good continuous control is achieved, the realism of generated results showed in paper an...	3. There are also some other works focusing on the semantic face editing and they show the ability to achieve continuous control over different attributes, like [1]. Could you elaborate the difference between your work and these papers?
NIPS_2020_686	NIPS_2020	- The objective function (1): I have two concerns about the definition of this objective: 1. If the intuitive goal consists of finding a set of policies that contains an optimal policy for every test MDP in S_{test}, I would rather evaluate the quality of \overline{\Pi} with the performance in the worst MDP. In other w...	1. If the intuitive goal consists of finding a set of policies that contains an optimal policy for every test MDP in S_{test}, I would rather evaluate the quality of \overline{\Pi} with the performance in the worst MDP. In other words, I would have employed the \min over S_{test} rather than the summation. With the sum...
ARR_2022_247_review	ARR_2022	- The authors should more explicitly discuss other work/data that addresses multi-intent sentences. Footnote 6 discusses work on multi-intent identification on ATIS/MultiWOZ/DSTC4 and synthetically generated multi-intent data (MixATIS and MixSNIPS), but this is not discussed in detail in the main text. - Additionally, ...	- Additionally, footnotes are used FAR too extensively in this paper -- it's actually very distracting. Much of the content is actually important and should be moved into the main body of the paper! Details around parameter settings etc. can be moved into the appendix to make space (e.g., L468).
UQpbq4v8Xi	EMNLP_2023	There isn't a huge amount of novelty here. The main contribution, as far as I can tell, is the exploration of the capabilities of an off-the-shelf LLM for data generation. The greatest performance is gained from the inclusion of domain-specific knowledge and few-shot demonstrations to the prompt, neither of which are "...	3) A set of few-shot demonstrations to draw from (possible to obtain, with the help of domain experts). A discussion about this would have been appreciated. While most of the experiments are interesting and relevant, I find the inclusion of zero-shot generation results a bit strange here. I suppose this might satisfy g...
NIPS_2016_192	NIPS_2016	Weakness: (e.g., why I am recommending poster, and not oral) - Impact: This paper makes it easier to train models using learning to search, but it doesn't really advance state-of-the-art in terms of the kind of models we can build. - Impact: This paper could be improved by explicitly showing the settings for the variou...	- (Minor issue) What's up with Figure 3? "OAA" is never referenced in the body text. It looks like there's more content in the appendix that is missing here, or the caption is out of date.
NIPS_2020_1309	NIPS_2020	1. The authors must be more clear in the introduction that the proposed solution is a "fix" of [12], rather than a new PIC approach, as introduced in lines 29-30 by saying: "... This paper presents a framework which solves instance discrimination by direct parametric instance classification (PIC)". This framework has b...	1. The authors must be more clear in the introduction that the proposed solution is a "fix" of [12], rather than a new PIC approach, as introduced in lines 29-30 by saying: "... This paper presents a framework which solves instance discrimination by direct parametric instance classification (PIC)". This framework has b...
NIPS_2018_865	NIPS_2018	weakness of this paper are listed: 1) The proposed method is very similar to Squeeze-and-Excitation Networks [1], but there is no comparison to the related work quantitatively. 2) There is only the results on image classification task. However, one of success for deep learning is that it allows people leverage pretrain...	3) GS module is used to propagate the context information over different spatial locations. Is the effective receptive field improved, which can be computed from [2]? It is interesting to know how the effective receptive field changed after applying GS module.
ICLR_2022_2370	ICLR_2022	The text could use more clarity when it comes to the methods. For example, to figure out the RL part of the model, I had to explore Figure 2 instead of reading the related portions of the text. Moving the first part of the Experiments section up in the text, renaming it to Methods, and appending it with details may hel...	1) The objective for the LSTM part would be the same for pre-training and finetuning (as in: the probabilities of the actions); in the finetuning stage, the authors may simply add another head to the network computing the value functions for the states.
1OGhJCGdcP	ICLR_2025	* The proposed method does not function as a subgoal representation learning approach but rather predicts state affinity. * The paper lacks strong positioning within the subgoal representation learning literature. It cites only one relevant work and does not provide adequate motivation or comparison with existing metho...	3. What is the rationale behind combining G4RL with HRAC (i.e., HRAC-G4RL)? Does G4RL require HRAC's regularization in the latent space?
ARR_2022_8_review	ARR_2022	1) If I understand correctly there is a need to know the word and phoneme segment boundaries for this task. This is a pretty strong assumption and can be unreliable for many languages. The experimentation done by the authors use both ground truth and provided segmentation which I think is good to show that the techniqu...	1) Regarding the related works -- "there is a long line of work that use supervised, multilingual systems" -- it would be good to acknowledge some of the older works too.
NIPS_2021_537	NIPS_2021	Weakness: The main weakness of the approach is the lack of novelty. 1. The key contribution of the paper is to propose a framework which gradually fits the high-performing sub-space in the NAS search space using a set of weak predictors rather than fitting the whole space using one strong predictor. However, this high-...	7. The results in Table 2 which show linear-/exponential-decay sampling clearly underperforms uniform sampling confuse me a bit. If the predictor is accurate on the good subregion, as argued by the authors, increasing the sampling probability for top-performing predicted architectures should lead to better performance ...
ICLR_2022_2112	ICLR_2022	1 Collaborative rating prediction is a very well-studied problem, for which there are lots of existing works. Moreover, in most real recommender systems, item ranking is more consistent with a real setting. 2 The time complexity seems rather high. First, the authors use an item-oriented autoencoder, in which there may ...	2 The time complexity seems rather high. First, the authors use an item-oriented autoencoder, in which there may be lots of users associated with a typical item. Second, the elementwise function is expensive. Third, the number of hidden units is much larger than a typical matrix factorization-based method.
ICLR_2022_1393	ICLR_2022	I think that: The comparison to baselines could be improved. Some of the claims are not carefully backed up. The explanation of the relationship to the existing literature could be improved. More details on the above weaknesses: Comparison to baselines: "We did not find good benchmarks to compare our unsupervised, iter...	- Many of the figures would be more clear if they said pre-trained solution encoders & solution decoders, since there are multiple types of autoencoders.
NIPS_2022_874	NIPS_2022	I found the presentation at times to be more complicated than it needs to be. I would suggest adding a simple running example (could be very low-dimensional) throughout the paper that already clearly shows why the proposed method clearly works and we really don't need a specialized training procedure. It would be helpf...	2021 Oct;27(4):710-36. Yes, the author(s) do briefly address the limitation of their approach (i.e., it doesn't handle large m ), and I found their response to the question on potential negative societal impact in the checklist to be adequate.
w5oP27fmYW	ICLR_2024	- Main concern: While the improvement in results is clear and the implementation is simple, I'm currently not convinced by the argumentation. My concern is that the authors propose adding an explicit inductive bias, which assumes that all target models are zero-centered. This assumption may or may not hold for general ...	- I’m missing comparison with a NeRF-based methods, like the recent Zero-1-to-3 - I also recommend comparison with point-e - I don’t see the relevance of the occlusion experiment — it doesn’t seem like the method is proposing anything specific to occlusion. Minor:
NIPS_2021_1604	NIPS_2021	). Weaknesses - Some parts of the paper are difficult to follow, see also Typos etc below. - Ideally other baselines would also be included, such as the other works discussed in related work [29, 5, 6]. After the Authors' Response My weakness points after been addressed in the authors' response. Consequently I raised m...	- Line 14, 47: A brief explanation of “multi-aspect” would be helpful - Figure 1: Subscripts s and t should be 1 and 2?
ARR_2022_108_review	ARR_2022	1. First of all, compared with other excellent papers, this paper is slightly less innovative. 2. The baseline is is not strong enough. Expect to see experiments that compare with the baseline of the papers you cited. 3. p indicates the proportion of documents, I would like to know how the parts of sentences and docume...	3. p indicates the proportion of documents, I would like to know how the parts of sentences and documents are extracted? Do the rules of extraction have any effect on the experiment? I hope to see a more detailed analysis.
NIPS_2016_450	NIPS_2016	. First of all, the experimental results are quite interesting, especially that the algorithm outperforms DQN on Atari. The results on the synthetic experiment are also interesting. I have three main concerns about the paper. 1. There is significant difficulty in reconstructing what is precisely going on. For example, ...	* Can you say something about the computation required to implement the experiments? How long did the experiments take and on what kind of hardware?

Subsets and Splits

No community queries yet

The top public SQL queries from the community will appear here once available.